All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4
@ 2014-11-14 13:05 Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
                   ` (20 more replies)
  0 siblings, 21 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

As of version 3, the qcow2 file format supports different widths for
refcount entries, ranging from 1 to 64 bit (only powers of two).
Currently, qemu only supports 16 bit, which is the only width supported
by version 2 (compat=0.10) images.

This series adds support to qemu for all other valid refcount orders.
This is mainly done by adding two function pointers into the
BDRVQcowState structure for reading and writing refcount values
independently of the current refcount entry width; all in-memory
refcount arrays (mostly cached refcount blocks) now are void pointers
and are accessed through these functions alone.

Thanks to previous work of making the qemu code agnostic of e.g. the
number of refcount entries per refcount block, the rest is fairly
trivial. The most complex patch in this series is patch 18 which
implements changing the refcount order through qemu-img amend.

To test different refcount widths, simply invoke the qemu-iotests check
program with -o refcount_width=${your_desired_width}. The final test in
this series adds some tests for operations which do not work with
certain refcount orders and for refcount order amendment.


As of this version, this series depends on version 3 of my
"chardev: Add -qmp-pretty" series (due to different test output of test
067, which makes changing it here much nicer).


v2:
- Patch 2:
  - Added justification for always emitting refcount-width to the commit
    message [Eric]
  - Due to making this series dependent on the -qmp-pretty series, the
    reference output of iotest 067 is changed differently
- Patch 3: Added justification for using int64_t instead of uint64_t to
  the commit message [Eric]
- Patch 6: This replaces patch 7 from v1.
  - Added an assertion and an explanation why refcount_array_byte_size()
    cannot overflow [Eric]
  - Added a helper function for reallocating a refcount array (this is
    what we really want); make use of that function
- Patch 7: Was patch 6 in v1, swapped the order to reduce the diffstat
  and not having to explain why a truncating division does not truncate
  here [Eric]
  - Added overflow check in update_refcount()
  - Consequent use of int64_t for refcounts [Eric]
  - Use g_try_malloc0_n() instead of g_try_malloc0() once to prevent a
    multiplication overflow [Eric]
  - Dropped the temporary on_disk_refblock buffer in
    rebuild_refcount_structure(); just directly write the in-memory
    refcount array to disk
- Patch 8: Added an assertion in set_refcount_ro6() that the MSb of the
  given value is not set (which makes the value fit into an int64_t)
  [Eric]
- Patch 9: Dropped the refcount width specified by the image header from
  the output to prevent overflows [Eric]
- Patch 10: Kept the same, because this calculation was never meant to
  be exact
- Patch 11:
  - Added comments why certain refcount_widths to do not work with some
    tests [Eric]
  - Changed the _unsupported_imgopts argument for allowing only
    refcount_width=16 to disallow any other refcount width instead of
    listing them all (your proposal did not work, though, Eric)
  - Fixed comment in test 007 [Eric]
  - 079 does support any refcount width, we only need to use
    _make_test_img instead of calling qemu-img create manually, so do
    that
  - Disallowing 090 for refcount_width=1 was an artifact from before v1
    (where qcow2_alloc_bytes() simply returned an error when a refcount
    overflow was about to occur, instead of skipping to the next
    cluster), so remove that limitation
- Patch 12: Dropped 079 reference output change because patch 11 took
  care of that (by using _make_test_img)
- Patch 14: Removed trailing full stops in error_report() calls [Eric]
- Patch 17:
  - Fixed indentation for the function header of qcow2_amend_helper_cb()
    [Eric]
  - Renamed arguments of qcow2_amend_helper_cb() to make it more obvious
    what they mean in the context of this function [Eric, in a way]
  - Added to assertions regarding total_operations [Eric]
- Patch 18:
  - Add a typedef for the function pointer which is given to
    walk_over_reftable(); the benefit is that this makes documentation
    of the parameters easier [Eric]
  - Add an "allocated" parameter to that RefblockFinishOp and
    consequently to walk_over_reftable() so that changes in the original
    refcount structures can be tracked; if anything has been changed,
    walk_over_reftable() is rerun with alloc_refblock() until no new
    allocations were necessary anymore
  - Some changes to alloc_refblock() and flush_refblock() to accommodate
    to this change
  - Call the status_cb additionally once at the end of
    walk_over_reftable() [Eric]
  - Fix leak of the new reftable clusters on error [Eric]
- Patch 21:
  - Added comment that snapshotting an image with a small refcount width
    fails for now, but may work in the future [Eric]
  - %s/shoud/should/ [Eric]
  - 64s/qocw2/qcow2/
  - Added comment that failing to amend an image because of refcount
    overflows may work in the future [Eric]
  - 232s/entriy/entry/
  - Added a test for multiple walks during refcount width amendment (for
    the changes to patch 18)


git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/21:[----] [--] 'qcow2: Add two new fields to BDRVQcowState'
002/21:[0015] [FC] 'qcow2: Add refcount_width to format-specific info'
003/21:[----] [--] 'qcow2: Use 64 bits for refcount values'
004/21:[----] [--] 'qcow2: Respect error in qcow2_alloc_bytes()'
005/21:[----] [--] 'qcow2: Refcount overflow and qcow2_alloc_bytes()'
006/21:[down] 'qcow2: Helper for refcount array reallocation'
007/21:[0062] [FC] 'qcow2: Helper function for refcount modification'
008/21:[0003] [FC] 'qcow2: More helpers for refcount modification'
009/21:[0004] [FC] 'qcow2: Open images with refcount order != 4'
010/21:[----] [--] 'qcow2: refcount_order parameter for qcow2_create2'
011/21:[0096] [FC] 'iotests: Prepare for refcount_width option'
012/21:[0018] [FC] 'qcow2: Allow creation with refcount order != 4'
013/21:[----] [--] 'block: Add opaque value to the amend CB'
014/21:[0018] [FC] 'qcow2: Use error_report() in qcow2_amend_options()'
015/21:[----] [--] 'qcow2: Use abort() instead of assert(false)'
016/21:[----] [--] 'qcow2: Split upgrade/downgrade paths for amend'
017/21:[0014] [FC] 'qcow2: Use intermediate helper CB for amend'
018/21:[0159] [FC] 'qcow2: Add function for refcount order amendment'
019/21:[----] [--] 'qcow2: Invoke refcount order amendment function'
020/21:[----] [--] 'qcow2: Point to amend function in check'
021/21:[0051] [FC] 'iotests: Add test for different refcount widths'


Max Reitz (21):
  qcow2: Add two new fields to BDRVQcowState
  qcow2: Add refcount_width to format-specific info
  qcow2: Use 64 bits for refcount values
  qcow2: Respect error in qcow2_alloc_bytes()
  qcow2: Refcount overflow and qcow2_alloc_bytes()
  qcow2: Helper for refcount array reallocation
  qcow2: Helper function for refcount modification
  qcow2: More helpers for refcount modification
  qcow2: Open images with refcount order != 4
  qcow2: refcount_order parameter for qcow2_create2
  iotests: Prepare for refcount_width option
  qcow2: Allow creation with refcount order != 4
  block: Add opaque value to the amend CB
  qcow2: Use error_report() in qcow2_amend_options()
  qcow2: Use abort() instead of assert(false)
  qcow2: Split upgrade/downgrade paths for amend
  qcow2: Use intermediate helper CB for amend
  qcow2: Add function for refcount order amendment
  qcow2: Invoke refcount order amendment function
  qcow2: Point to amend function in check
  iotests: Add test for different refcount widths

 block.c                          |   4 +-
 block/qcow2-cluster.c            |  23 +-
 block/qcow2-refcount.c           | 946 +++++++++++++++++++++++++++++++++------
 block/qcow2.c                    | 256 ++++++++---
 block/qcow2.h                    |  24 +-
 include/block/block.h            |   4 +-
 include/block/block_int.h        |   4 +-
 qapi/block-core.json             |   5 +-
 qemu-img.c                       |   5 +-
 tests/qemu-iotests/007           |   3 +
 tests/qemu-iotests/015           |   2 +
 tests/qemu-iotests/026           |   7 +
 tests/qemu-iotests/029           |   1 +
 tests/qemu-iotests/049.out       | 112 ++---
 tests/qemu-iotests/051           |   3 +
 tests/qemu-iotests/058           |   2 +
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  14 +-
 tests/qemu-iotests/065           |  23 +-
 tests/qemu-iotests/067           |   2 +
 tests/qemu-iotests/067.out       |   5 +
 tests/qemu-iotests/079           |  10 +-
 tests/qemu-iotests/079.out       |  38 +-
 tests/qemu-iotests/080           |   2 +
 tests/qemu-iotests/082.out       |  48 +-
 tests/qemu-iotests/085.out       |  38 +-
 tests/qemu-iotests/089           |   2 +
 tests/qemu-iotests/089.out       |   2 +
 tests/qemu-iotests/108           |   2 +
 tests/qemu-iotests/112           | 252 +++++++++++
 tests/qemu-iotests/112.out       | 131 ++++++
 tests/qemu-iotests/common.filter |   3 +-
 tests/qemu-iotests/group         |   1 +
 33 files changed, 1644 insertions(+), 331 deletions(-)
 create mode 100755 tests/qemu-iotests/112
 create mode 100644 tests/qemu-iotests/112.out

-- 
1.9.3

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 01/21] qcow2: Add two new fields to BDRVQcowState
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
@ 2014-11-14 13:05 ` Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add two new fields regarding refcount information (the bit width of
every entry and the maximum refcount value) to the BDRVQcowState.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-refcount.c | 2 +-
 block/qcow2.c          | 9 +++++++++
 block/qcow2.h          | 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 9afdb40..6016211 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -584,7 +584,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
 
         refcount = be16_to_cpu(refcount_block[block_index]);
         refcount += addend;
-        if (refcount < 0 || refcount > 0xffff) {
+        if (refcount < 0 || refcount > s->refcount_max) {
             ret = -EINVAL;
             goto fail;
         }
diff --git a/block/qcow2.c b/block/qcow2.c
index d120494..f57aff9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -684,6 +684,15 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
     s->refcount_order = header.refcount_order;
+    s->refcount_bits = 1 << s->refcount_order;
+    if (s->refcount_order < 6) {
+        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
+    } else {
+        /* The above shift would overflow with s->refcount_bits == 64;
+         * furthermore, we do not want to use UINT64_MAX because refcounts will
+         * be passed around in int64_ts (negative values for -errno) */
+        s->refcount_max = INT64_MAX;
+    }
 
     if (header.crypt_method > QCOW_CRYPT_AES) {
         error_setg(errp, "Unsupported encryption method: %" PRIu32,
diff --git a/block/qcow2.h b/block/qcow2.h
index 6e39a1b..4d8c902 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -258,6 +258,8 @@ typedef struct BDRVQcowState {
     int qcow_version;
     bool use_lazy_refcounts;
     int refcount_order;
+    int refcount_bits;
+    uint64_t refcount_max;
 
     bool discard_passthrough[QCOW2_DISCARD_MAX];
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
@ 2014-11-14 13:05 ` Max Reitz
  2014-11-15 16:00   ` Eric Blake
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 03/21] qcow2: Use 64 bits for refcount values Max Reitz
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add the bit width of every refcount entry to the format-specific
information.

In contrast to lazy_refcounts and the corrupt flag, this should be
always emitted, even for compat=0.10 although it does not support any
refcount width other than 16 bits. This is because if a boolean is
optional, one normally assumes it to be false when omitted; but if an
integer is not specified, it is rather difficult to guess its value.

This new field breaks some test outputs, fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c              |  4 +++-
 qapi/block-core.json       |  5 ++++-
 tests/qemu-iotests/060.out |  1 +
 tests/qemu-iotests/065     | 23 +++++++++++++++--------
 tests/qemu-iotests/067.out |  5 +++++
 tests/qemu-iotests/082.out |  7 +++++++
 tests/qemu-iotests/089.out |  2 ++
 7 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f57aff9..d70e927 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2475,7 +2475,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
     };
     if (s->qcow_version == 2) {
         *spec_info->qcow2 = (ImageInfoSpecificQCow2){
-            .compat = g_strdup("0.10"),
+            .compat             = g_strdup("0.10"),
+            .refcount_width     = s->refcount_bits,
         };
     } else if (s->qcow_version == 3) {
         *spec_info->qcow2 = (ImageInfoSpecificQCow2){
@@ -2486,6 +2487,7 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs)
             .corrupt            = s->incompatible_features &
                                   QCOW2_INCOMPAT_CORRUPT,
             .has_corrupt        = true,
+            .refcount_width     = s->refcount_bits,
         };
     }
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b7083fb..e3a3cb7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -41,13 +41,16 @@
 # @corrupt: #optional true if the image has been marked corrupt; only valid for
 #           compat >= 1.1 (since 2.2)
 #
+# @refcount-width: width of a refcount entry in bits (since 2.3)
+#
 # Since: 1.7
 ##
 { 'type': 'ImageInfoSpecificQCow2',
   'data': {
       'compat': 'str',
       '*lazy-refcounts': 'bool',
-      '*corrupt': 'bool'
+      '*corrupt': 'bool',
+      'refcount-width': 'int'
   } }
 
 ##
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 9419da1..17b3eaf 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -19,6 +19,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: true
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot be opened read/write
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 8d3a9c9..8539aeb 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -88,34 +88,41 @@ class TestQMP(TestImageInfoSpecific):
 class TestQCow2(TestQemuImgInfo):
     '''Testing a qcow2 version 2 image'''
     img_options = 'compat=0.10'
-    json_compare = { 'compat': '0.10' }
-    human_compare = [ 'compat: 0.10' ]
+    json_compare = { 'compat': '0.10', 'refcount-width': 16 }
+    human_compare = [ 'compat: 0.10', 'refcount width: 16' ]
 
 class TestQCow3NotLazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
-    json_compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': False }
-    human_compare = [ 'compat: 1.1', 'lazy refcounts: false', 'corrupt: false' ]
+    json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
+                     'refcount-width': 16, 'corrupt': False }
+    human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
+                      'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
-    json_compare = { 'compat': '1.1', 'lazy-refcounts': True, 'corrupt': False }
-    human_compare = [ 'compat: 1.1', 'lazy refcounts: true', 'corrupt: false' ]
+    json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
+                     'refcount-width': 16, 'corrupt': False }
+    human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
+                      'refcount width: 16', 'corrupt: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
        with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
     qemu_options = 'lazy-refcounts=on'
-    compare = { 'compat': '1.1', 'lazy-refcounts': False, 'corrupt': False }
+    compare = { 'compat': '1.1', 'lazy-refcounts': False,
+                'refcount-width': 16, 'corrupt': False }
+
 
 class TestQCow3LazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled, opening
        with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
     qemu_options = 'lazy-refcounts=off'
-    compare = { 'compat': '1.1', 'lazy-refcounts': True, 'corrupt': False }
+    compare = { 'compat': '1.1', 'lazy-refcounts': True,
+                'refcount-width': 16, 'corrupt': False }
 
 TestImageInfoSpecific = None
 TestQemuImgInfo = None
diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out
index 929dc74..0deb97c 100644
--- a/tests/qemu-iotests/067.out
+++ b/tests/qemu-iotests/067.out
@@ -32,6 +32,7 @@ Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk -device virti
                         "data": {
                             "compat": "1.1",
                             "lazy-refcounts": false,
+                            "refcount-width": 16,
                             "corrupt": false
                         }
                     },
@@ -202,6 +203,7 @@ Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,if=none,id=disk
                         "data": {
                             "compat": "1.1",
                             "lazy-refcounts": false,
+                            "refcount-width": 16,
                             "corrupt": false
                         }
                     },
@@ -402,6 +404,7 @@ Testing:
                         "data": {
                             "compat": "1.1",
                             "lazy-refcounts": false,
+                            "refcount-width": 16,
                             "corrupt": false
                         }
                     },
@@ -581,6 +584,7 @@ Testing:
                         "data": {
                             "compat": "1.1",
                             "lazy-refcounts": false,
+                            "refcount-width": 16,
                             "corrupt": false
                         }
                     },
@@ -686,6 +690,7 @@ Testing:
                         "data": {
                             "compat": "1.1",
                             "lazy-refcounts": false,
+                            "refcount-width": 16,
                             "corrupt": false
                         }
                     },
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 0a3ab5a..4b14b4f 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -21,6 +21,7 @@ cluster_size: 4096
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
@@ -35,6 +36,7 @@ cluster_size: 8192
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
@@ -199,6 +201,7 @@ cluster_size: 4096
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: convert -O qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -212,6 +215,7 @@ cluster_size: 8192
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: convert -O qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -361,6 +365,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: amend -f qcow2 -o size=130M -o lazy_refcounts=off TEST_DIR/t.qcow2
@@ -374,6 +379,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: false
 
 Testing: amend -f qcow2 -o size=8M -o lazy_refcounts=on -o size=132M TEST_DIR/t.qcow2
@@ -387,6 +393,7 @@ cluster_size: 65536
 Format specific information:
     compat: 1.1
     lazy refcounts: true
+    refcount width: 16
     corrupt: false
 
 Testing: amend -f qcow2 -o size=4M,size=148M TEST_DIR/t.qcow2
diff --git a/tests/qemu-iotests/089.out b/tests/qemu-iotests/089.out
index b2b0390..d788b46 100644
--- a/tests/qemu-iotests/089.out
+++ b/tests/qemu-iotests/089.out
@@ -41,6 +41,7 @@ vm state offset: 512 MiB
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: false
 format name: IMGFMT
 cluster size: 64 KiB
@@ -48,5 +49,6 @@ vm state offset: 512 MiB
 Format specific information:
     compat: 1.1
     lazy refcounts: false
+    refcount width: 16
     corrupt: false
 *** done
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 03/21] qcow2: Use 64 bits for refcount values
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
@ 2014-11-14 13:05 ` Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Refcounts may have a width of up to 64 bits, so qemu should use the same
width to represent refcount values internally.

Since for instance qcow2_get_refcount() signals an error by returning a
negative value, refcount values are generally signed to be able to
represent those error values correctly. This limits the maximum refcount
value supported by qemu to INT64_MAX (= 63 bits), as established in
"qcow2: Add two new fields to BDRVQcowState".

This limitation should have no implications in practice for normal valid
images. If the MSb in a 64 bit refcount value is set, we can safely
assume the value to be invalid (because reaching such high refcounts is
impossible due to other limitations of the qcow2 format).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-cluster.c  |  9 ++++++---
 block/qcow2-refcount.c | 37 ++++++++++++++++++++-----------------
 block/qcow2.h          |  7 ++++---
 3 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index df0b2c9..ab43902 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1640,7 +1640,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
     for (i = 0; i < l1_size; i++) {
         uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
         bool l2_dirty = false;
-        int l2_refcount;
+        int64_t l2_refcount;
 
         if (!l2_offset) {
             /* unallocated */
@@ -1696,14 +1696,17 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 }
 
                 if (l2_refcount > 1) {
+                    int64_t ret64;
+
                     /* For shared L2 tables, set the refcount accordingly (it is
                      * already 1 and needs to be l2_refcount) */
-                    ret = qcow2_update_cluster_refcount(bs,
+                    ret64 = qcow2_update_cluster_refcount(bs,
                             offset >> s->cluster_bits, l2_refcount - 1,
                             QCOW2_DISCARD_OTHER);
-                    if (ret < 0) {
+                    if (ret64 < 0) {
                         qcow2_free_clusters(bs, offset, s->cluster_size,
                                             QCOW2_DISCARD_OTHER);
+                        ret = ret64;
                         goto fail;
                     }
                 }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6016211..6e06531 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -91,14 +91,14 @@ static int load_refcount_block(BlockDriverState *bs,
  * return value is the refcount of the cluster, negative values are -errno
  * and indicate an error.
  */
-int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
+int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t refcount_table_index, block_index;
     int64_t refcount_block_offset;
     int ret;
     uint16_t *refcount_block;
-    uint16_t refcount;
+    int64_t refcount;
 
     refcount_table_index = cluster_index >> s->refcount_block_bits;
     if (refcount_table_index >= s->refcount_table_size)
@@ -556,9 +556,10 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     for(cluster_offset = start; cluster_offset <= last;
         cluster_offset += s->cluster_size)
     {
-        int block_index, refcount;
+        int block_index;
         int64_t cluster_index = cluster_offset >> s->cluster_bits;
         int64_t table_index = cluster_index >> s->refcount_block_bits;
+        int64_t refcount;
 
         /* Load the refcount block and allocate it if needed */
         if (table_index != old_table_index) {
@@ -634,10 +635,10 @@ fail:
  * If the return value is non-negative, it is the new refcount of the cluster.
  * If it is negative, it is -errno and indicates an error.
  */
-int qcow2_update_cluster_refcount(BlockDriverState *bs,
-                                  int64_t cluster_index,
-                                  int addend,
-                                  enum qcow2_discard_type type)
+int64_t qcow2_update_cluster_refcount(BlockDriverState *bs,
+                                      int64_t cluster_index,
+                                      int addend,
+                                      enum qcow2_discard_type type)
 {
     BDRVQcowState *s = bs->opaque;
     int ret;
@@ -663,7 +664,7 @@ static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t i, nb_clusters;
-    int refcount;
+    int64_t refcount;
 
     nb_clusters = size_to_clusters(s, size);
 retry:
@@ -722,7 +723,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
     BDRVQcowState *s = bs->opaque;
     uint64_t cluster_index;
     uint64_t i;
-    int refcount, ret;
+    int64_t refcount;
+    int ret;
 
     assert(nb_clusters >= 0);
     if (nb_clusters == 0) {
@@ -878,8 +880,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     BDRVQcowState *s = bs->opaque;
     uint64_t *l1_table, *l2_table, l2_offset, offset, l1_size2;
     bool l1_allocated = false;
-    int64_t old_offset, old_l2_offset;
-    int i, j, l1_modified = 0, nb_csectors, refcount;
+    int64_t old_offset, old_l2_offset, refcount;
+    int i, j, l1_modified = 0, nb_csectors;
     int ret;
 
     l2_table = NULL;
@@ -1341,7 +1343,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
     BDRVQcowState *s = bs->opaque;
     uint64_t *l2_table = qemu_blockalign(bs, s->cluster_size);
     int ret;
-    int refcount;
+    int64_t refcount;
     int i, j;
 
     for (i = 0; i < s->l1_size; i++) {
@@ -1360,7 +1362,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
         if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
             fprintf(stderr, "%s OFLAG_COPIED L2 cluster: l1_index=%d "
-                    "l1_entry=%" PRIx64 " refcount=%d\n",
+                    "l1_entry=%" PRIx64 " refcount=%" PRId64 "\n",
                     fix & BDRV_FIX_ERRORS ? "Repairing" :
                                             "ERROR",
                     i, l1_entry, refcount);
@@ -1403,7 +1405,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                 }
                 if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
                     fprintf(stderr, "%s OFLAG_COPIED data cluster: "
-                            "l2_entry=%" PRIx64 " refcount=%d\n",
+                            "l2_entry=%" PRIx64 " refcount=%" PRId64 "\n",
                             fix & BDRV_FIX_ERRORS ? "Repairing" :
                                                     "ERROR",
                             l2_entry, refcount);
@@ -1628,8 +1630,8 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                               uint16_t *refcount_table, int64_t nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t i;
-    int refcount1, refcount2, ret;
+    int64_t i, refcount1, refcount2;
+    int ret;
 
     for (i = 0, *highest_cluster = 0; i < nb_clusters; i++) {
         refcount1 = qcow2_get_refcount(bs, i);
@@ -1657,7 +1659,8 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                 num_fixed = &res->corruptions_fixed;
             }
 
-            fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
+            fprintf(stderr, "%s cluster %" PRId64 " refcount=%" PRId64
+                    " reference=%" PRId64 "\n",
                    num_fixed != NULL     ? "Repairing" :
                    refcount1 < refcount2 ? "ERROR" :
                                            "Leaked",
diff --git a/block/qcow2.h b/block/qcow2.h
index 4d8c902..0f8eb15 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -489,10 +489,11 @@ void qcow2_signal_corruption(BlockDriverState *bs, bool fatal, int64_t offset,
 int qcow2_refcount_init(BlockDriverState *bs);
 void qcow2_refcount_close(BlockDriverState *bs);
 
-int qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index);
+int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index);
 
-int qcow2_update_cluster_refcount(BlockDriverState *bs, int64_t cluster_index,
-                                  int addend, enum qcow2_discard_type type);
+int64_t qcow2_update_cluster_refcount(BlockDriverState *bs,
+                                      int64_t cluster_index, int addend,
+                                      enum qcow2_discard_type type);
 
 int64_t qcow2_alloc_clusters(BlockDriverState *bs, uint64_t size);
 int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 04/21] qcow2: Respect error in qcow2_alloc_bytes()
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (2 preceding siblings ...)
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 03/21] qcow2: Use 64 bits for refcount values Max Reitz
@ 2014-11-14 13:05 ` Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

qcow2_update_cluster_refcount() may fail, and qcow2_alloc_bytes() should
mind that case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-refcount.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6e06531..be4e5fe 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,7 +761,8 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t offset, cluster_offset;
+    int64_t offset, cluster_offset, new_cluster;
+    int64_t ret;
     int free_in_cluster;
 
     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
@@ -783,23 +784,32 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         free_in_cluster -= size;
         if (free_in_cluster == 0)
             s->free_byte_offset = 0;
-        if (offset_into_cluster(s, offset) != 0)
-            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-                                          QCOW2_DISCARD_NEVER);
+        if (offset_into_cluster(s, offset) != 0) {
+            ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
+                                                1, QCOW2_DISCARD_NEVER);
+            if (ret < 0) {
+                return ret;
+            }
+        }
     } else {
-        offset = qcow2_alloc_clusters(bs, s->cluster_size);
-        if (offset < 0) {
-            return offset;
+        new_cluster = qcow2_alloc_clusters(bs, s->cluster_size);
+        if (new_cluster < 0) {
+            return new_cluster;
         }
         cluster_offset = start_of_cluster(s, s->free_byte_offset);
-        if ((cluster_offset + s->cluster_size) == offset) {
+        if ((cluster_offset + s->cluster_size) == new_cluster) {
             /* we are lucky: contiguous data */
             offset = s->free_byte_offset;
-            qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1,
-                                          QCOW2_DISCARD_NEVER);
+            ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
+                                                1, QCOW2_DISCARD_NEVER);
+            if (ret < 0) {
+                qcow2_free_clusters(bs, new_cluster, s->cluster_size,
+                                    QCOW2_DISCARD_NEVER);
+                return ret;
+            }
             s->free_byte_offset += size;
         } else {
-            s->free_byte_offset = offset;
+            s->free_byte_offset = new_cluster;
             goto redo;
         }
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (3 preceding siblings ...)
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
@ 2014-11-14 13:05 ` Max Reitz
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation Max Reitz
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
the refcount is increased accordingly. However, if this would lead to an
overflow the function should instead just not reuse this cluster and
allocate a new one.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-refcount.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index be4e5fe..66c78c0 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -761,12 +761,13 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
 int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t offset, cluster_offset, new_cluster;
+    int64_t offset, cluster_offset, new_cluster, refcount;
     int64_t ret;
     int free_in_cluster;
 
     BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES);
     assert(size > 0 && size <= s->cluster_size);
+ redo:
     if (s->free_byte_offset == 0) {
         offset = qcow2_alloc_clusters(bs, s->cluster_size);
         if (offset < 0) {
@@ -774,12 +775,25 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         }
         s->free_byte_offset = offset;
     }
- redo:
+
     free_in_cluster = s->cluster_size -
         offset_into_cluster(s, s->free_byte_offset);
     if (size <= free_in_cluster) {
         /* enough space in current cluster */
         offset = s->free_byte_offset;
+
+        if (offset_into_cluster(s, offset) != 0) {
+            /* We will have to increase the refcount of this cluster; if the
+             * maximum has been reached already, this cluster cannot be used */
+            refcount = qcow2_get_refcount(bs, offset >> s->cluster_bits);
+            if (refcount < 0) {
+                return refcount;
+            } else if (refcount == s->refcount_max) {
+                s->free_byte_offset = 0;
+                goto redo;
+            }
+        }
+
         s->free_byte_offset += size;
         free_in_cluster -= size;
         if (free_in_cluster == 0)
@@ -800,6 +814,20 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
         if ((cluster_offset + s->cluster_size) == new_cluster) {
             /* we are lucky: contiguous data */
             offset = s->free_byte_offset;
+
+            /* Same as above: In order to reuse the cluster, the refcount has to
+             * be increased; if that will not work, we are not so lucky after
+             * all */
+            refcount = qcow2_get_refcount(bs, offset >> s->cluster_bits);
+            if (refcount < 0) {
+                qcow2_free_clusters(bs, new_cluster, s->cluster_size,
+                                    QCOW2_DISCARD_NEVER);
+                return refcount;
+            } else if (refcount == s->refcount_max) {
+                s->free_byte_offset = offset;
+                goto redo;
+            }
+
             ret = qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits,
                                                 1, QCOW2_DISCARD_NEVER);
             if (ret < 0) {
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (4 preceding siblings ...)
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
@ 2014-11-14 13:05 ` Max Reitz
  2014-11-15 16:50   ` Eric Blake
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification Max Reitz
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a helper function for reallocating a refcount array, independently
of the refcount order. The newly allocated space is zeroed and the
function handles failed reallocations gracefully.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 121 +++++++++++++++++++++++++++++--------------------
 1 file changed, 72 insertions(+), 49 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 66c78c0..18fcd0d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1108,6 +1108,54 @@ fail:
 /* refcount checking functions */
 
 
+static size_t refcount_array_byte_size(BDRVQcowState *s, uint64_t entries)
+{
+    if (s->refcount_order < 3) {
+        /* sub-byte width */
+        int shift = 3 - s->refcount_order;
+        return (entries + (1 << shift) - 1) >> shift;
+    } else if (s->refcount_order == 3) {
+        /* byte width */
+        return entries;
+    } else {
+        /* multiple bytes wide */
+
+        /* This assertion holds because there is no way we can address more than
+         * 2^(64 - 9) clusters at once (with cluster size 512 = 2^9, and because
+         * offsets have to be representable in bytes); due to every cluster
+         * corresponding to one refcount entry and because refcount_order has to
+         * be below 7, we are far below that limit */
+        assert(!(entries >> (64 - (s->refcount_order - 3))));
+
+        return entries << (s->refcount_order - 3);
+    }
+}
+
+static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
+                                  int64_t *size, int64_t new_size)
+{
+    /* Round to clusters so the array can be directly written to disk */
+    size_t old_byte_size = ROUND_UP(refcount_array_byte_size(s, *size),
+                                    s->cluster_size);
+    size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
+                                    s->cluster_size);
+    uint16_t *new_ptr;
+
+    new_ptr = g_try_realloc(*array, new_byte_size);
+    if (new_byte_size && !new_ptr) {
+        return -ENOMEM;
+    }
+
+    if (new_ptr) {
+        memset((void *)((uintptr_t)new_ptr + old_byte_size), 0,
+               new_byte_size - old_byte_size);
+    }
+
+    *array = new_ptr;
+    *size  = new_size;
+
+    return 0;
+}
 
 /*
  * Increases the refcount for a range of clusters in a given refcount table.
@@ -1124,6 +1172,7 @@ static int inc_refcounts(BlockDriverState *bs,
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t start, last, cluster_offset, k;
+    int ret;
 
     if (size <= 0) {
         return 0;
@@ -1135,23 +1184,12 @@ static int inc_refcounts(BlockDriverState *bs,
         cluster_offset += s->cluster_size) {
         k = cluster_offset >> s->cluster_bits;
         if (k >= *refcount_table_size) {
-            int64_t old_refcount_table_size = *refcount_table_size;
-            uint16_t *new_refcount_table;
-
-            *refcount_table_size = k + 1;
-            new_refcount_table = g_try_realloc(*refcount_table,
-                                               *refcount_table_size *
-                                               sizeof(**refcount_table));
-            if (!new_refcount_table) {
-                *refcount_table_size = old_refcount_table_size;
+            ret = realloc_refcount_array(s, refcount_table,
+                                         refcount_table_size, k + 1);
+            if (ret < 0) {
                 res->check_errors++;
-                return -ENOMEM;
+                return ret;
             }
-            *refcount_table = new_refcount_table;
-
-            memset(*refcount_table + old_refcount_table_size, 0,
-                   (*refcount_table_size - old_refcount_table_size) *
-                   sizeof(**refcount_table));
         }
 
         if (++(*refcount_table)[k] == 0) {
@@ -1518,8 +1556,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                     fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
 
             if (fix & BDRV_FIX_ERRORS) {
-                int64_t old_nb_clusters = *nb_clusters;
-                uint16_t *new_refcount_table;
+                int64_t new_nb_clusters;
 
                 if (offset > INT64_MAX - s->cluster_size) {
                     ret = -EINVAL;
@@ -1536,22 +1573,15 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                     goto resize_fail;
                 }
 
-                *nb_clusters = size_to_clusters(s, size);
-                assert(*nb_clusters >= old_nb_clusters);
+                new_nb_clusters = size_to_clusters(s, size);
+                assert(new_nb_clusters >= *nb_clusters);
 
-                new_refcount_table = g_try_realloc(*refcount_table,
-                                                   *nb_clusters *
-                                                   sizeof(**refcount_table));
-                if (!new_refcount_table) {
-                    *nb_clusters = old_nb_clusters;
+                ret = realloc_refcount_array(s, refcount_table,
+                                             nb_clusters, new_nb_clusters);
+                if (ret < 0) {
                     res->check_errors++;
-                    return -ENOMEM;
+                    return ret;
                 }
-                *refcount_table = new_refcount_table;
-
-                memset(*refcount_table + old_nb_clusters, 0,
-                       (*nb_clusters - old_nb_clusters) *
-                       sizeof(**refcount_table));
 
                 if (cluster >= *nb_clusters) {
                     ret = -EINVAL;
@@ -1611,10 +1641,12 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     int ret;
 
     if (!*refcount_table) {
-        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
-        if (*nb_clusters && *refcount_table == NULL) {
+        int64_t old_size = 0;
+        ret = realloc_refcount_array(s, refcount_table,
+                                     &old_size, *nb_clusters);
+        if (ret < 0) {
             res->check_errors++;
-            return -ENOMEM;
+            return ret;
         }
     }
 
@@ -1746,6 +1778,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
     int64_t cluster = *first_free_cluster, i;
     bool first_gap = true;
     int contiguous_free_clusters;
+    int ret;
 
     /* Starting at *first_free_cluster, find a range of at least cluster_count
      * continuously free clusters */
@@ -1775,28 +1808,18 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
     /* If no such range could be found, grow the in-memory refcount table
      * accordingly to append free clusters at the end of the image */
     if (contiguous_free_clusters < cluster_count) {
-        int64_t old_imrt_nb_clusters = *imrt_nb_clusters;
-        uint16_t *new_refcount_table;
-
         /* contiguous_free_clusters clusters are already empty at the image end;
          * we need cluster_count clusters; therefore, we have to allocate
          * cluster_count - contiguous_free_clusters new clusters at the end of
          * the image (which is the current value of cluster; note that cluster
          * may exceed old_imrt_nb_clusters if *first_free_cluster pointed beyond
          * the image end) */
-        *imrt_nb_clusters = cluster + cluster_count - contiguous_free_clusters;
-        new_refcount_table = g_try_realloc(*refcount_table,
-                                           *imrt_nb_clusters *
-                                           sizeof(**refcount_table));
-        if (!new_refcount_table) {
-            *imrt_nb_clusters = old_imrt_nb_clusters;
-            return -ENOMEM;
-        }
-        *refcount_table = new_refcount_table;
-
-        memset(*refcount_table + old_imrt_nb_clusters, 0,
-               (*imrt_nb_clusters - old_imrt_nb_clusters) *
-               sizeof(**refcount_table));
+        ret = realloc_refcount_array(s, refcount_table, imrt_nb_clusters,
+                                     cluster + cluster_count
+                                     - contiguous_free_clusters);
+        if (ret < 0) {
+            return ret;
+        }
     }
 
     /* Go back to the first free cluster */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (5 preceding siblings ...)
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-15 17:02   ` Eric Blake
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers " Max Reitz
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Since refcounts do not always have to be a uint16_t, all refcount blocks
and arrays in memory should not have a specific type (thus they become
pointers to void) and for accessing them, two helper functions are used
(a getter and a setter). Those functions are called indirectly through
function pointers in the BDRVQcowState so they may later be exchanged
for different refcount orders.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 128 ++++++++++++++++++++++++++++++-------------------
 block/qcow2.h          |   8 ++++
 2 files changed, 87 insertions(+), 49 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 18fcd0d..b3ca7d2 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -32,6 +32,11 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
                             int64_t offset, int64_t length,
                             int addend, enum qcow2_discard_type type);
 
+static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index);
+
+static void set_refcount_ro4(void *refcount_array, uint64_t index,
+                             uint64_t value);
+
 
 /*********************************************************/
 /* refcount handling */
@@ -42,6 +47,9 @@ int qcow2_refcount_init(BlockDriverState *bs)
     unsigned int refcount_table_size2, i;
     int ret;
 
+    s->get_refcount = &get_refcount_ro4;
+    s->set_refcount = &set_refcount_ro4;
+
     assert(s->refcount_table_size <= INT_MAX / sizeof(uint64_t));
     refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
     s->refcount_table = g_try_malloc(refcount_table_size2);
@@ -72,6 +80,19 @@ void qcow2_refcount_close(BlockDriverState *bs)
 }
 
 
+static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index)
+{
+    return be16_to_cpu(((const uint16_t *)refcount_array)[index]);
+}
+
+static void set_refcount_ro4(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 16));
+    ((uint16_t *)refcount_array)[index] = cpu_to_be16(value);
+}
+
+
 static int load_refcount_block(BlockDriverState *bs,
                                int64_t refcount_block_offset,
                                void **refcount_block)
@@ -97,7 +118,7 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
     uint64_t refcount_table_index, block_index;
     int64_t refcount_block_offset;
     int ret;
-    uint16_t *refcount_block;
+    void *refcount_block;
     int64_t refcount;
 
     refcount_table_index = cluster_index >> s->refcount_block_bits;
@@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
     }
 
     ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-        (void**) &refcount_block);
+                          &refcount_block);
     if (ret < 0) {
         return ret;
     }
 
     block_index = cluster_index & (s->refcount_block_size - 1);
-    refcount = be16_to_cpu(refcount_block[block_index]);
+    refcount = s->get_refcount(refcount_block, block_index);
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache,
-        (void**) &refcount_block);
+    ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
     if (ret < 0) {
         return ret;
     }
 
+    if (refcount < 0) {
+        /* overflow */
+        return -ERANGE;
+    }
+
     return refcount;
 }
 
@@ -169,7 +194,7 @@ static int in_same_refcount_block(BDRVQcowState *s, uint64_t offset_a,
  * Returns 0 on success or -errno in error case
  */
 static int alloc_refcount_block(BlockDriverState *bs,
-    int64_t cluster_index, uint16_t **refcount_block)
+                                int64_t cluster_index, void **refcount_block)
 {
     BDRVQcowState *s = bs->opaque;
     unsigned int refcount_table_index;
@@ -196,7 +221,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
             }
 
              return load_refcount_block(bs, refcount_block_offset,
-                 (void**) refcount_block);
+                                        refcount_block);
         }
     }
 
@@ -256,7 +281,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         /* The block describes itself, need to update the cache */
         int block_index = (new_block >> s->cluster_bits) &
             (s->refcount_block_size - 1);
-        (*refcount_block)[block_index] = cpu_to_be16(1);
+        s->set_refcount(*refcount_block, block_index, 1);
     } else {
         /* Described somewhere else. This can recurse at most twice before we
          * arrive at a block that describes itself. */
@@ -274,7 +299,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         /* Initialize the new refcount block only after updating its refcount,
          * update_refcount uses the refcount cache itself */
         ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+                                    refcount_block);
         if (ret < 0) {
             goto fail_block;
         }
@@ -308,7 +333,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         return -EAGAIN;
     }
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+    ret = qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
     if (ret < 0) {
         goto fail_block;
     }
@@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
         s->cluster_size;
     uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
     uint64_t *new_table = g_try_new0(uint64_t, table_size);
-    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
+    void *new_blocks = g_try_malloc0_n(blocks_clusters, s->cluster_size);
 
     assert(table_size > 0 && blocks_clusters > 0);
     if (new_table == NULL || new_blocks == NULL) {
@@ -384,7 +409,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     uint64_t table_clusters = size_to_clusters(s, table_size * sizeof(uint64_t));
     int block = 0;
     for (i = 0; i < table_clusters + blocks_clusters; i++) {
-        new_blocks[block++] = cpu_to_be16(1);
+        s->set_refcount(new_blocks, block++, 1);
     }
 
     /* Write refcount blocks to disk */
@@ -437,7 +462,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
     qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t),
                         QCOW2_DISCARD_OTHER);
 
-    ret = load_refcount_block(bs, new_block, (void**) refcount_block);
+    ret = load_refcount_block(bs, new_block, refcount_block);
     if (ret < 0) {
         return ret;
     }
@@ -452,7 +477,7 @@ fail_table:
     g_free(new_table);
 fail_block:
     if (*refcount_block != NULL) {
-        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+        qcow2_cache_put(bs, s->refcount_block_cache, refcount_block);
     }
     return ret;
 }
@@ -532,7 +557,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
 {
     BDRVQcowState *s = bs->opaque;
     int64_t start, last, cluster_offset;
-    uint16_t *refcount_block = NULL;
+    void *refcount_block = NULL;
     int64_t old_table_index = -1;
     int ret;
 
@@ -583,7 +608,12 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         /* we can update the count and save it */
         block_index = cluster_index & (s->refcount_block_size - 1);
 
-        refcount = be16_to_cpu(refcount_block[block_index]);
+        refcount = s->get_refcount(refcount_block, block_index);
+        if (refcount < 0) {
+            ret = -ERANGE;
+            goto fail;
+        }
+
         refcount += addend;
         if (refcount < 0 || refcount > s->refcount_max) {
             ret = -EINVAL;
@@ -592,7 +622,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         if (refcount == 0 && cluster_index < s->free_cluster_index) {
             s->free_cluster_index = cluster_index;
         }
-        refcount_block[block_index] = cpu_to_be16(refcount);
+        s->set_refcount(refcount_block, block_index, refcount);
 
         if (refcount == 0 && s->discard_passthrough[type]) {
             update_refcount_discard(bs, cluster_offset, s->cluster_size);
@@ -608,8 +638,7 @@ fail:
     /* Write last changed block to disk */
     if (refcount_block) {
         int wret;
-        wret = qcow2_cache_put(bs, s->refcount_block_cache,
-            (void**) &refcount_block);
+        wret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
         if (wret < 0) {
             return ret < 0 ? ret : wret;
         }
@@ -1131,7 +1160,7 @@ static size_t refcount_array_byte_size(BDRVQcowState *s, uint64_t entries)
     }
 }
 
-static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
+static int realloc_refcount_array(BDRVQcowState *s, void **array,
                                   int64_t *size, int64_t new_size)
 {
     /* Round to clusters so the array can be directly written to disk */
@@ -1139,7 +1168,7 @@ static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
                                     s->cluster_size);
     size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
                                     s->cluster_size);
-    uint16_t *new_ptr;
+    void *new_ptr;
 
     new_ptr = g_try_realloc(*array, new_byte_size);
     if (new_byte_size && !new_ptr) {
@@ -1166,12 +1195,13 @@ static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
  */
 static int inc_refcounts(BlockDriverState *bs,
                          BdrvCheckResult *res,
-                         uint16_t **refcount_table,
+                         void **refcount_table,
                          int64_t *refcount_table_size,
                          int64_t offset, int64_t size)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t start, last, cluster_offset, k;
+    int64_t refcount;
     int ret;
 
     if (size <= 0) {
@@ -1192,11 +1222,14 @@ static int inc_refcounts(BlockDriverState *bs,
             }
         }
 
-        if (++(*refcount_table)[k] == 0) {
+        refcount = s->get_refcount(*refcount_table, k);
+        if (refcount == s->refcount_max) {
             fprintf(stderr, "ERROR: overflow cluster offset=0x%" PRIx64
                     "\n", cluster_offset);
             res->corruptions++;
+            continue;
         }
+        s->set_refcount(*refcount_table, k, refcount + 1);
     }
 
     return 0;
@@ -1216,7 +1249,7 @@ enum {
  * error occurred.
  */
 static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
-    uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
+    void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
     int flags)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1334,7 +1367,7 @@ fail:
  */
 static int check_refcounts_l1(BlockDriverState *bs,
                               BdrvCheckResult *res,
-                              uint16_t **refcount_table,
+                              void **refcount_table,
                               int64_t *refcount_table_size,
                               int64_t l1_table_offset, int l1_size,
                               int flags)
@@ -1531,7 +1564,7 @@ fail:
  */
 static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                            BdrvCheckMode fix, bool *rebuild,
-                           uint16_t **refcount_table, int64_t *nb_clusters)
+                           void **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i, size;
@@ -1616,9 +1649,10 @@ resize_fail:
             if (ret < 0) {
                 return ret;
             }
-            if ((*refcount_table)[cluster] != 1) {
+            if (s->get_refcount(*refcount_table, cluster) != 1) {
                 fprintf(stderr, "ERROR refcount block %" PRId64
-                        " refcount=%d\n", i, (*refcount_table)[cluster]);
+                        " refcount=%" PRIu64 "\n", i,
+                        s->get_refcount(*refcount_table, cluster));
                 res->corruptions++;
                 *rebuild = true;
             }
@@ -1633,7 +1667,7 @@ resize_fail:
  */
 static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                                BdrvCheckMode fix, bool *rebuild,
-                               uint16_t **refcount_table, int64_t *nb_clusters)
+                               void **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i;
@@ -1697,7 +1731,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                               BdrvCheckMode fix, bool *rebuild,
                               int64_t *highest_cluster,
-                              uint16_t *refcount_table, int64_t nb_clusters)
+                              void *refcount_table, int64_t nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i, refcount1, refcount2;
@@ -1712,7 +1746,7 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             continue;
         }
 
-        refcount2 = refcount_table[i];
+        refcount2 = s->get_refcount(refcount_table, i);
 
         if (refcount1 > 0 || refcount2 > 0) {
             *highest_cluster = i;
@@ -1770,7 +1804,7 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
  */
 static int64_t alloc_clusters_imrt(BlockDriverState *bs,
                                    int cluster_count,
-                                   uint16_t **refcount_table,
+                                   void **refcount_table,
                                    int64_t *imrt_nb_clusters,
                                    int64_t *first_free_cluster)
 {
@@ -1787,7 +1821,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
          contiguous_free_clusters < cluster_count;
          cluster++)
     {
-        if (!(*refcount_table)[cluster]) {
+        if (!s->get_refcount(*refcount_table, cluster)) {
             contiguous_free_clusters++;
             if (first_gap) {
                 /* If this is the first free cluster found, update
@@ -1825,7 +1859,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
     /* Go back to the first free cluster */
     cluster -= contiguous_free_clusters;
     for (i = 0; i < cluster_count; i++) {
-        (*refcount_table)[cluster + i] = 1;
+        s->set_refcount(*refcount_table, cluster + i, 1);
     }
 
     return cluster << s->cluster_bits;
@@ -1841,7 +1875,7 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
  */
 static int rebuild_refcount_structure(BlockDriverState *bs,
                                       BdrvCheckResult *res,
-                                      uint16_t **refcount_table,
+                                      void **refcount_table,
                                       int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1849,8 +1883,8 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
     int64_t refblock_offset, refblock_start, refblock_index;
     uint32_t reftable_size = 0;
     uint64_t *on_disk_reftable = NULL;
-    uint16_t *on_disk_refblock;
-    int i, ret = 0;
+    void *on_disk_refblock;
+    int ret = 0;
     struct {
         uint64_t reftable_offset;
         uint32_t reftable_clusters;
@@ -1860,7 +1894,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs,
 
 write_refblocks:
     for (; cluster < *nb_clusters; cluster++) {
-        if (!(*refcount_table)[cluster]) {
+        if (!s->get_refcount(*refcount_table, cluster)) {
             continue;
         }
 
@@ -1933,17 +1967,13 @@ write_refblocks:
             goto fail;
         }
 
-        on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size);
-        for (i = 0; i < s->refcount_block_size &&
-                    refblock_start + i < *nb_clusters; i++)
-        {
-            on_disk_refblock[i] =
-                cpu_to_be16((*refcount_table)[refblock_start + i]);
-        }
+        /* The size of *refcount_table is always cluster-aligned, therefore the
+         * write operation will not overflow */
+        on_disk_refblock = (void *)((uintptr_t)*refcount_table +
+                                    (refblock_index << s->refcount_block_bits));
 
         ret = bdrv_write(bs->file, refblock_offset / BDRV_SECTOR_SIZE,
-                         (void *)on_disk_refblock, s->cluster_sectors);
-        qemu_vfree(on_disk_refblock);
+                         on_disk_refblock, s->cluster_sectors);
         if (ret < 0) {
             fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
             goto fail;
@@ -2038,7 +2068,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     BDRVQcowState *s = bs->opaque;
     BdrvCheckResult pre_compare_res;
     int64_t size, highest_cluster, nb_clusters;
-    uint16_t *refcount_table = NULL;
+    void *refcount_table = NULL;
     bool rebuild = false;
     int ret;
 
@@ -2087,7 +2117,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         /* Because the old reftable has been exchanged for a new one the
          * references have to be recalculated */
         rebuild = false;
-        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
+        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
         ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
                                   &nb_clusters);
         if (ret < 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 0f8eb15..1c63221 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
     QTAILQ_ENTRY(Qcow2DiscardRegion) next;
 } Qcow2DiscardRegion;
 
+typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
+                                      uint64_t index);
+typedef void Qcow2SetRefcountFunc(void *refcount_array,
+                                  uint64_t index, uint64_t value);
+
 typedef struct BDRVQcowState {
     int cluster_bits;
     int cluster_size;
@@ -261,6 +266,9 @@ typedef struct BDRVQcowState {
     int refcount_bits;
     uint64_t refcount_max;
 
+    Qcow2GetRefcountFunc *get_refcount;
+    Qcow2SetRefcountFunc *set_refcount;
+
     bool discard_passthrough[QCOW2_DISCARD_MAX];
 
     int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers for refcount modification
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (6 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-15 17:08   ` Eric Blake
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4 Max Reitz
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add helper functions for getting and setting refcounts in a refcount
array for any possible refcount order, and choose the correct one during
refcount initialization.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 146 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 144 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b3ca7d2..2e13a9c 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -32,10 +32,73 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
                             int64_t offset, int64_t length,
                             int addend, enum qcow2_discard_type type);
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro3(const void *refcount_array, uint64_t index);
 static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro5(const void *refcount_array, uint64_t index);
+static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index);
 
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro3(void *refcount_array, uint64_t index,
+                             uint64_t value);
 static void set_refcount_ro4(void *refcount_array, uint64_t index,
                              uint64_t value);
+static void set_refcount_ro5(void *refcount_array, uint64_t index,
+                             uint64_t value);
+static void set_refcount_ro6(void *refcount_array, uint64_t index,
+                             uint64_t value);
+
+static void get_refcount_functions(int refcount_order,
+                                   Qcow2GetRefcountFunc **get,
+                                   Qcow2SetRefcountFunc **set)
+{
+    switch (refcount_order) {
+        case 0:
+            *get = &get_refcount_ro0;
+            *set = &set_refcount_ro0;
+            break;
+
+        case 1:
+            *get = &get_refcount_ro1;
+            *set = &set_refcount_ro1;
+            break;
+
+        case 2:
+            *get = &get_refcount_ro2;
+            *set = &set_refcount_ro2;
+            break;
+
+        case 3:
+            *get = &get_refcount_ro3;
+            *set = &set_refcount_ro3;
+            break;
+
+        case 4:
+            *get = &get_refcount_ro4;
+            *set = &set_refcount_ro4;
+            break;
+
+        case 5:
+            *get = &get_refcount_ro5;
+            *set = &set_refcount_ro5;
+            break;
+
+        case 6:
+            *get = &get_refcount_ro6;
+            *set = &set_refcount_ro6;
+            break;
+
+        default:
+            abort();
+    }
+}
 
 
 /*********************************************************/
@@ -47,8 +110,8 @@ int qcow2_refcount_init(BlockDriverState *bs)
     unsigned int refcount_table_size2, i;
     int ret;
 
-    s->get_refcount = &get_refcount_ro4;
-    s->set_refcount = &set_refcount_ro4;
+    get_refcount_functions(s->refcount_order,
+                           &s->get_refcount, &s->set_refcount);
 
     assert(s->refcount_table_size <= INT_MAX / sizeof(uint64_t));
     refcount_table_size2 = s->refcount_table_size * sizeof(uint64_t);
@@ -80,6 +143,59 @@ void qcow2_refcount_close(BlockDriverState *bs)
 }
 
 
+static uint64_t get_refcount_ro0(const void *refcount_array, uint64_t index)
+{
+    return (((const uint8_t *)refcount_array)[index / 8] >> (index % 8)) & 0x1;
+}
+
+static void set_refcount_ro0(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 1));
+    ((uint8_t *)refcount_array)[index / 8] &= ~(0x1 << (index % 8));
+    ((uint8_t *)refcount_array)[index / 8] |= value << (index % 8);
+}
+
+static uint64_t get_refcount_ro1(const void *refcount_array, uint64_t index)
+{
+    return (((const uint8_t *)refcount_array)[index / 4] >> (2 * (index % 4)))
+           & 0x3;
+}
+
+static void set_refcount_ro1(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 2));
+    ((uint8_t *)refcount_array)[index / 4] &= ~(0x3 << (2 * (index % 4)));
+    ((uint8_t *)refcount_array)[index / 4] |= value << (2 * (index % 4));
+}
+
+static uint64_t get_refcount_ro2(const void *refcount_array, uint64_t index)
+{
+    return (((const uint8_t *)refcount_array)[index / 2] >> (4 * (index % 2)))
+           & 0xf;
+}
+
+static void set_refcount_ro2(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 4));
+    ((uint8_t *)refcount_array)[index / 2] &= ~(0xf << (4 * (index % 2)));
+    ((uint8_t *)refcount_array)[index / 2] |= value << (4 * (index % 2));
+}
+
+static uint64_t get_refcount_ro3(const void *refcount_array, uint64_t index)
+{
+    return ((const uint8_t *)refcount_array)[index];
+}
+
+static void set_refcount_ro3(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 8));
+    ((uint8_t *)refcount_array)[index] = value;
+}
+
 static uint64_t get_refcount_ro4(const void *refcount_array, uint64_t index)
 {
     return be16_to_cpu(((const uint16_t *)refcount_array)[index]);
@@ -92,6 +208,32 @@ static void set_refcount_ro4(void *refcount_array, uint64_t index,
     ((uint16_t *)refcount_array)[index] = cpu_to_be16(value);
 }
 
+static uint64_t get_refcount_ro5(const void *refcount_array, uint64_t index)
+{
+    return be32_to_cpu(((const uint32_t *)refcount_array)[index]);
+}
+
+static void set_refcount_ro5(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    assert(!(value >> 32));
+    ((uint32_t *)refcount_array)[index] = cpu_to_be32(value);
+}
+
+static uint64_t get_refcount_ro6(const void *refcount_array, uint64_t index)
+{
+    return be64_to_cpu(((const uint64_t *)refcount_array)[index]);
+}
+
+static void set_refcount_ro6(void *refcount_array, uint64_t index,
+                             uint64_t value)
+{
+    /* for 64 bit refcounts, refcount_max is INT64_MAX to prevent signed
+     * overflows (and to allow for -errno style return values) */
+    assert(!(value >> 63));
+    ((uint64_t *)refcount_array)[index] = cpu_to_be64(value);
+}
+
 
 static int load_refcount_block(BlockDriverState *bs,
                                int64_t refcount_block_offset,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (7 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers " Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-15 17:09   ` Eric Blake
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

No longer refuse to open images with a different refcount entry width
than 16 bits; only reject images with a refcount width larger than 64
bits (which is prohibited by the specification).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d70e927..528d696 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -677,10 +677,10 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     /* Check support for various header values */
-    if (header.refcount_order != 4) {
-        report_unsupported(bs, errp, "%d bit reference counts",
-                           1 << header.refcount_order);
-        ret = -ENOTSUP;
+    if (header.refcount_order > 6) {
+        error_setg(errp, "Reference count entry width too large; may not "
+                   "exceed 64 bit");
+        ret = -EINVAL;
         goto fail;
     }
     s->refcount_order = header.refcount_order;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (8 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4 Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-15 17:13   ` Eric Blake
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option Max Reitz
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a refcount_order parameter to qcow2_create2(), use that value for
the image header and for calculating the size required for
preallocation.

For now, always pass 4.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 41 ++++++++++++++++++++++++++++++-----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 528d696..6dc1984 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1775,7 +1775,7 @@ static int preallocate(BlockDriverState *bs)
 static int qcow2_create2(const char *filename, int64_t total_size,
                          const char *backing_file, const char *backing_format,
                          int flags, size_t cluster_size, PreallocMode prealloc,
-                         QemuOpts *opts, int version,
+                         QemuOpts *opts, int version, int refcount_order,
                          Error **errp)
 {
     /* Calculate cluster_bits */
@@ -1811,6 +1811,13 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         int64_t meta_size = 0;
         uint64_t nreftablee, nrefblocke, nl1e, nl2e;
         int64_t aligned_total_size = align_offset(total_size, cluster_size);
+        int refblock_bits, refblock_size;
+        /* refcount entry size in bytes */
+        double rces = (1 << refcount_order) / 8.;
+
+        /* see qcow2_open() */
+        refblock_bits = cluster_bits - (refcount_order - 3);
+        refblock_size = 1 << refblock_bits;
 
         /* header: 1 cluster */
         meta_size += cluster_size;
@@ -1835,20 +1842,20 @@ static int qcow2_create2(const char *filename, int64_t total_size,
          *   c = cluster size
          *   y1 = number of refcount blocks entries
          *   y2 = meta size including everything
+         *   rces = refcount entry size in bytes
          * then,
          *   y1 = (y2 + a)/c
-         *   y2 = y1 * sizeof(u16) + y1 * sizeof(u16) * sizeof(u64) / c + m
+         *   y2 = y1 * rces + y1 * rces * sizeof(u64) / c + m
          * we can get y1:
-         *   y1 = (a + m) / (c - sizeof(u16) - sizeof(u16) * sizeof(u64) / c)
+         *   y1 = (a + m) / (c - rces - rces * sizeof(u64) / c)
          */
-        nrefblocke = (aligned_total_size + meta_size + cluster_size) /
-            (cluster_size - sizeof(uint16_t) -
-             1.0 * sizeof(uint16_t) * sizeof(uint64_t) / cluster_size);
-        nrefblocke = align_offset(nrefblocke, cluster_size / sizeof(uint16_t));
-        meta_size += nrefblocke * sizeof(uint16_t);
+        nrefblocke = (aligned_total_size + meta_size + cluster_size)
+                   / (cluster_size - rces - rces * sizeof(uint64_t)
+                                                 / cluster_size);
+        meta_size += DIV_ROUND_UP(nrefblocke, refblock_size) * cluster_size;
 
         /* total size of refcount tables */
-        nreftablee = nrefblocke * sizeof(uint16_t) / cluster_size;
+        nreftablee = nrefblocke / refblock_size;
         nreftablee = align_offset(nreftablee, cluster_size / sizeof(uint64_t));
         meta_size += nreftablee * sizeof(uint64_t);
 
@@ -1883,7 +1890,7 @@ static int qcow2_create2(const char *filename, int64_t total_size,
         .l1_size                    = cpu_to_be32(0),
         .refcount_table_offset      = cpu_to_be64(cluster_size),
         .refcount_table_clusters    = cpu_to_be32(1),
-        .refcount_order             = cpu_to_be32(4),
+        .refcount_order             = cpu_to_be32(refcount_order),
         .header_length              = cpu_to_be32(sizeof(*header)),
     };
 
@@ -2003,6 +2010,7 @@ static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
     size_t cluster_size = DEFAULT_CLUSTER_SIZE;
     PreallocMode prealloc;
     int version = 3;
+    int refcount_width = 16, refcount_order;
     Error *local_err = NULL;
     int ret;
 
@@ -2057,8 +2065,19 @@ static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
         goto finish;
     }
 
+    if (version < 3 && refcount_width != 16) {
+        error_setg(errp, "Different refcount widths than 16 bits require "
+                   "compatibility level 1.1 or above (use compat=1.1 or "
+                   "greater)");
+        ret = -EINVAL;
+        goto finish;
+    }
+
+    refcount_order = ffs(refcount_width) - 1;
+
     ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
-                        cluster_size, prealloc, opts, version, &local_err);
+                        cluster_size, prealloc, opts, version, refcount_order,
+                        &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (9 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-15 17:17   ` Eric Blake
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Some tests do not work well with certain refcount widths (i.e. you
cannot create internal snapshots with refcount_width=1), so make those
widths unsupported.

Furthermore, add another filter to _filter_img_create in common.filter
which filters out the refcount_width value.

This is necessary for test 079, which does actually work with any
refcount width, but invoking qemu-img directly leads to the
refcount_width value being visible in the output; use _make_test_img
instead which will filter it out.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/007           |  3 +++
 tests/qemu-iotests/015           |  2 ++
 tests/qemu-iotests/026           |  7 +++++++
 tests/qemu-iotests/029           |  1 +
 tests/qemu-iotests/051           |  3 +++
 tests/qemu-iotests/058           |  2 ++
 tests/qemu-iotests/067           |  2 ++
 tests/qemu-iotests/079           | 10 ++--------
 tests/qemu-iotests/079.out       | 38 ++++++++++----------------------------
 tests/qemu-iotests/080           |  2 ++
 tests/qemu-iotests/089           |  2 ++
 tests/qemu-iotests/108           |  2 ++
 tests/qemu-iotests/common.filter |  3 ++-
 13 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/tests/qemu-iotests/007 b/tests/qemu-iotests/007
index fe1a743..8d92490 100755
--- a/tests/qemu-iotests/007
+++ b/tests/qemu-iotests/007
@@ -43,6 +43,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+# refcount_width must be at least 4 bits so we can create ten internal snapshots
+# (1 bit supports none, 2 bits support two, 4 bits support 14)
+_unsupported_imgopts 'refcount_width=\(1\|2\)[^0-9]'
 
 echo
 echo "creating image"
diff --git a/tests/qemu-iotests/015 b/tests/qemu-iotests/015
index 099d757..040ca22 100755
--- a/tests/qemu-iotests/015
+++ b/tests/qemu-iotests/015
@@ -43,6 +43,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+# Internal snapshots are (currently) impossible with refcount_width=1
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 echo
 echo "creating image"
diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index df2884b..3b7a07f 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -46,6 +46,13 @@ _supported_proto file
 _supported_os Linux
 _default_cache_mode "writethrough"
 _supported_cache_modes "writethrough" "none"
+# The refcount table tests expect a certain minimum width for refcount entries
+# (so that the refcount table actually needs to grow); that minimum is 16 bits,
+# being the default refcount entry width.
+# 32 and 64 bits do not work either, however, due to different leaked cluster
+# count on error.
+# Thus, the only remaining option is refcount_width=16.
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 echo "Errors while writing 128 kB"
 echo
diff --git a/tests/qemu-iotests/029 b/tests/qemu-iotests/029
index fa46ace..aa416a6 100755
--- a/tests/qemu-iotests/029
+++ b/tests/qemu-iotests/029
@@ -44,6 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto generic
 _supported_os Linux
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 offset_size=24
 offset_l1_size=36
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 11c858f..8e7d326 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -41,6 +41,9 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# A compat=0.10 image is created in this test which does not support anything
+# other than refcount_width=16
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 function do_run_qemu()
 {
diff --git a/tests/qemu-iotests/058 b/tests/qemu-iotests/058
index 14584cd..3b03d8e 100755
--- a/tests/qemu-iotests/058
+++ b/tests/qemu-iotests/058
@@ -88,6 +88,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _require_command QEMU_NBD
+# Internal snapshots are (currently) impossible with refcount_width=1
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 echo
 echo "== preparing image =="
diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index 29cd6b5..7fd5f65 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -35,6 +35,8 @@ status=1	# failure is the default!
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# Because anything other than 16 would change the output of query-block
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 function do_run_qemu()
 {
diff --git a/tests/qemu-iotests/079 b/tests/qemu-iotests/079
index 6613cfb..ade6efa 100755
--- a/tests/qemu-iotests/079
+++ b/tests/qemu-iotests/079
@@ -42,19 +42,13 @@ _supported_fmt qcow2
 _supported_proto file nfs
 _supported_os Linux
 
-function test_qemu_img()
-{
-    echo qemu-img "$@" | _filter_testdir
-    $QEMU_IMG "$@" 2>&1 | _filter_testdir
-    echo
-}
-
 echo "=== Check option preallocation and cluster_size ==="
 echo
 cluster_sizes="16384 32768 65536 131072 262144 524288 1048576 2097152 4194304"
 
 for s in $cluster_sizes; do
-    test_qemu_img create -f $IMGFMT -o preallocation=metadata,cluster_size=$s "$TEST_IMG" 4G
+    IMGOPTS=$(_optstr_add "$IMGOPTS" "preallocation=metadata,cluster_size=$s") \
+        _make_test_img 4G
 done
 
 # success, all done
diff --git a/tests/qemu-iotests/079.out b/tests/qemu-iotests/079.out
index ef4b8c9..6dc5d57 100644
--- a/tests/qemu-iotests/079.out
+++ b/tests/qemu-iotests/079.out
@@ -1,32 +1,14 @@
 QA output created by 079
 === Check option preallocation and cluster_size ===
 
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=16384 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=16384 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=32768 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=32768 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=65536 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=131072 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=131072 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=262144 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=262144 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=524288 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=524288 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=1048576 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=1048576 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=2097152 TEST_DIR/t.qcow2 4G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=2097152 preallocation='metadata' lazy_refcounts=off
-
-qemu-img create -f qcow2 -o preallocation=metadata,cluster_size=4194304 TEST_DIR/t.qcow2 4G
-qemu-img: TEST_DIR/t.qcow2: Cluster size must be a power of two between 512 and 2048k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=4294967296 encryption=off cluster_size=4194304 preallocation='metadata' lazy_refcounts=off
-
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
+qemu-img: TEST_DIR/t.IMGFMT: Cluster size must be a power of two between 512 and 2048k
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4294967296 preallocation='metadata'
 *** done
diff --git a/tests/qemu-iotests/080 b/tests/qemu-iotests/080
index 9de337c..0fa90c1 100755
--- a/tests/qemu-iotests/080
+++ b/tests/qemu-iotests/080
@@ -42,6 +42,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# Internal snapshots are (currently) impossible with refcount_width=1
+_unsupported_imgopts 'refcount_width=1[^0-9]'
 
 header_size=104
 
diff --git a/tests/qemu-iotests/089 b/tests/qemu-iotests/089
index dffc977..b2da188 100755
--- a/tests/qemu-iotests/089
+++ b/tests/qemu-iotests/089
@@ -41,6 +41,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# Because anything other than 16 would change the output of qemu_io -c info
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 # Using an image filename containing quotation marks will render the JSON data
 # below invalid. In that case, we have little choice but simply not to run this
diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108
index 12fc92a..88f9b3c 100755
--- a/tests/qemu-iotests/108
+++ b/tests/qemu-iotests/108
@@ -43,6 +43,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux
+# This test directly modifies a refblock so it relies on refcount_width being 16
+_unsupported_imgopts 'refcount_width=\([^1]\|.\([^6]\|$\)\)'
 
 echo
 echo '=== Repairing an image without any refcount table ==='
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 93642f3..121687a 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -190,7 +190,8 @@ _filter_img_create()
         -e "s# block_size=[0-9]\\+##g" \
         -e "s# block_state_zero=\\(on\\|off\\)##g" \
         -e "s# log_size=[0-9]\\+##g" \
-        -e "s/archipelago:a/TEST_DIR\//g"
+        -e "s/archipelago:a/TEST_DIR\//g" \
+        -e "s# refcount_width=[0-9]\\+##g"
 }
 
 _filter_img_info()
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 12/21] qcow2: Allow creation with refcount order != 4
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (10 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 13/21] block: Add opaque value to the amend CB Max Reitz
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a creation option to qcow2 for setting the refcount order of images
to be created, and respect that option's value.

This breaks some test outputs, fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c              |  20 ++++++++
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/049.out | 112 ++++++++++++++++++++++-----------------------
 tests/qemu-iotests/082.out |  41 ++++++++++++++---
 tests/qemu-iotests/085.out |  38 +++++++--------
 5 files changed, 130 insertions(+), 82 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6dc1984..657d558 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2065,6 +2065,17 @@ static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
         goto finish;
     }
 
+    refcount_width = qemu_opt_get_number_del(opts, BLOCK_OPT_REFCOUNT_WIDTH,
+                                             refcount_width);
+    if (refcount_width <= 0 || refcount_width > 64 ||
+        !is_power_of_2(refcount_width))
+    {
+        error_setg(errp, "Refcount width must be a power of two and may not "
+                   "exceed 64 bits");
+        ret = -EINVAL;
+        goto finish;
+    }
+
     if (version < 3 && refcount_width != 16) {
         error_setg(errp, "Different refcount widths than 16 bits require "
                    "compatibility level 1.1 or above (use compat=1.1 or "
@@ -2704,6 +2715,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         } else if (!strcmp(desc->name, "lazy_refcounts")) {
             lazy_refcounts = qemu_opt_get_bool(opts, "lazy_refcounts",
                                                lazy_refcounts);
+        } else if (!strcmp(desc->name, "refcount_width")) {
+            error_report("Cannot change refcount entry width");
+            return -ENOTSUP;
         } else {
             /* if this assertion fails, this probably means a new option was
              * added without having it covered here */
@@ -2873,6 +2887,12 @@ static QemuOptsList qcow2_create_opts = {
             .help = "Postpone refcount updates",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_REFCOUNT_WIDTH,
+            .type = QEMU_OPT_NUMBER,
+            .help = "Width of a reference count entry in bits",
+            .def_value_str = "16"
+        },
         { /* end of list */ }
     }
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a1c17b9..c34d610 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -56,6 +56,7 @@
 #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type"
 #define BLOCK_OPT_REDUNDANCY        "redundancy"
 #define BLOCK_OPT_NOCOW             "nocow"
+#define BLOCK_OPT_REFCOUNT_WIDTH    "refcount_width"
 
 typedef struct BdrvTrackedRequest {
     BlockDriverState *bs;
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 09ca0ae..9369c12 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == 2. Specifying size via -o ==
 
 qemu-img create -f qcow2 -o size=1024 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1024b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1024.0 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1024.0b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o size=1.5T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == 3. Invalid sizes ==
 
@@ -97,7 +97,7 @@ qemu-img: Image size must be less than 8 EiB!
 qemu-img create -f qcow2 -o size=-1024 TEST_DIR/t.qcow2
 qemu-img: qcow2 doesn't support shrinking images yet
 qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- -1k
 qemu-img: Image size must be less than 8 EiB!
@@ -105,17 +105,17 @@ qemu-img: Image size must be less than 8 EiB!
 qemu-img create -f qcow2 -o size=-1k TEST_DIR/t.qcow2
 qemu-img: qcow2 doesn't support shrinking images yet
 qemu-img: TEST_DIR/t.qcow2: Could not resize image: Operation not supported
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=-1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte
-qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for 
+qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for
 qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
 
 qemu-img create -f qcow2 -o size=1kilobyte TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- foobar
-qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for 
+qemu-img: Invalid image size specified! You may use k, M, G, T, P or E suffixes for
 qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.
 
 qemu-img create -f qcow2 -o size=foobar TEST_DIR/t.qcow2
@@ -125,84 +125,84 @@ qemu-img: TEST_DIR/t.qcow2: Invalid options for file format 'qcow2'
 == Check correct interpretation of suffixes for cluster size ==
 
 qemu-img create -f qcow2 -o cluster_size=1024 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1024b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1048576 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1048576 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=1024 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=512 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=524288 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=524288 lazy_refcounts=off refcount_width=16
 
 == Check compat level option ==
 
 qemu-img create -f qcow2 -o compat=0.10 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=1.1 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=0.42 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid compatibility level: '0.42'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.42' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.42' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=foobar TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid compatibility level: 'foobar'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='foobar' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='foobar' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == Check preallocation option ==
 
 qemu-img create -f qcow2 -o preallocation=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='off' lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='off' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=metadata TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='metadata' lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o preallocation=1234 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: invalid parameter value: 1234
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='1234' lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 preallocation='1234' lazy_refcounts=off refcount_width=16
 
 == Check encryption option ==
 
 qemu-img create -f qcow2 -o encryption=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o encryption=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 == Check lazy_refcounts option (only with v3) ==
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='1.1' encryption=off cluster_size=65536 lazy_refcounts=on refcount_width=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=on TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat='0.10' encryption=off cluster_size=65536 lazy_refcounts=on refcount_width=16
 
 *** done
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 4b14b4f..dc8bdd3 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -3,14 +3,14 @@ QA output created by 082
 === create: Options specified more than once ===
 
 Testing: create -f foo -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128M (134217728 bytes)
 cluster_size: 65536
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=4096 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=4096 lazy_refcounts=on refcount_width=16
 
 Testing: info TEST_DIR/t.qcow2
 image: TEST_DIR/t.qcow2
@@ -25,7 +25,7 @@ Format specific information:
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=on 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=on refcount_width=16
 
 Testing: info TEST_DIR/t.qcow2
 image: TEST_DIR/t.qcow2
@@ -40,7 +40,7 @@ Format specific information:
     corrupt: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=8192 lazy_refcounts=off refcount_width=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128M (134217728 bytes)
@@ -58,6 +58,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o ? TEST_DIR/t.qcow2 128M
@@ -70,6 +71,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k,help TEST_DIR/t.qcow2 128M
@@ -82,6 +84,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k,? TEST_DIR/t.qcow2 128M
@@ -94,6 +97,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o help,cluster_size=4k TEST_DIR/t.qcow2 128M
@@ -106,6 +110,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o ?,cluster_size=4k TEST_DIR/t.qcow2 128M
@@ -118,6 +123,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k -o help TEST_DIR/t.qcow2 128M
@@ -130,6 +136,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o cluster_size=4k -o ? TEST_DIR/t.qcow2 128M
@@ -142,13 +149,14 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,help' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,help' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2,,? TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,?' encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2,?' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2, -o help TEST_DIR/t.qcow2 128M
 qemu-img: Invalid option list: backing_file=TEST_DIR/t.qcow2,
@@ -169,6 +177,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 
 Testing: create -o help
 Supported options:
@@ -177,7 +186,7 @@ size             Virtual disk size
 === convert: Options specified more than once ===
 
 Testing: create -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off 
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 
 Testing: convert -f foo -f qcow2 TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -236,6 +245,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o ? TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -248,6 +258,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k,help TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -260,6 +271,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k,? TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -272,6 +284,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o help,cluster_size=4k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -284,6 +297,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o ?,cluster_size=4k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -296,6 +310,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k -o help TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -308,6 +323,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o cluster_size=4k -o ? TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -320,6 +336,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: convert -O qcow2 -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
@@ -347,6 +364,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 
 Testing: convert -o help
 Supported options:
@@ -414,6 +432,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o ? TEST_DIR/t.qcow2
@@ -426,6 +445,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k,help TEST_DIR/t.qcow2
@@ -438,6 +458,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k,? TEST_DIR/t.qcow2
@@ -450,6 +471,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o help,cluster_size=4k TEST_DIR/t.qcow2
@@ -462,6 +484,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o ?,cluster_size=4k TEST_DIR/t.qcow2
@@ -474,6 +497,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k -o help TEST_DIR/t.qcow2
@@ -486,6 +510,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o cluster_size=4k -o ? TEST_DIR/t.qcow2
@@ -498,6 +523,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 nocow            Turn off copy-on-write (valid only on btrfs)
 
 Testing: amend -f qcow2 -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2
@@ -527,6 +553,7 @@ encryption       Encrypt the image
 cluster_size     qcow2 cluster size
 preallocation    Preallocation mode (allowed values: off, metadata, falloc, full)
 lazy_refcounts   Postpone refcount updates
+refcount_width   Width of a reference count entry in bits
 
 Testing: convert -o help
 Supported options:
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index 0f2b17f..2e86fb7 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -11,7 +11,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 
 === Create a single snapshot on virtio0 ===
 
-Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2.orig' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2.orig' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
 
 === Invalid command - missing device and nodename ===
@@ -25,31 +25,31 @@ Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file
 
 === Create several transactional group snapshots ===
 
-Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/1-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/1-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/t.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/2-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/3-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/4-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/5-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/6-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/7-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/8-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
-Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
-Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off
+Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v0.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
+Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file='TEST_DIR/9-snapshot-v1.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 lazy_refcounts=off refcount_width=16
 {"return": {}}
 *** done
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 13/21] block: Add opaque value to the amend CB
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (11 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add an opaque value which is to be passed to the bdrv_amend_options()
status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block.c                   |  4 ++--
 block/qcow2-cluster.c     | 14 ++++++++------
 block/qcow2.c             |  9 +++++----
 block/qcow2.h             |  3 ++-
 include/block/block.h     |  4 ++--
 include/block/block_int.h |  3 ++-
 qemu-img.c                |  5 +++--
 7 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index c979d51..c34b188 100644
--- a/block.c
+++ b/block.c
@@ -5790,12 +5790,12 @@ void bdrv_add_before_write_notifier(BlockDriverState *bs,
 }
 
 int bdrv_amend_options(BlockDriverState *bs, QemuOpts *opts,
-                       BlockDriverAmendStatusCB *status_cb)
+                       BlockDriverAmendStatusCB *status_cb, void *cb_opaque)
 {
     if (!bs->drv->bdrv_amend_options) {
         return -ENOTSUP;
     }
-    return bs->drv->bdrv_amend_options(bs, opts, status_cb);
+    return bs->drv->bdrv_amend_options(bs, opts, status_cb, cb_opaque);
 }
 
 /* This function will be called by the bdrv_recurse_is_first_non_filter method
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ab43902..2daf334 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1620,7 +1620,8 @@ fail:
 static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                                       int l1_size, int64_t *visited_l1_entries,
                                       int64_t l1_entries,
-                                      BlockDriverAmendStatusCB *status_cb)
+                                      BlockDriverAmendStatusCB *status_cb,
+                                      void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     bool is_active_l1 = (l1_table == s->l1_table);
@@ -1646,7 +1647,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
             /* unallocated */
             (*visited_l1_entries)++;
             if (status_cb) {
-                status_cb(bs, *visited_l1_entries, l1_entries);
+                status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
             }
             continue;
         }
@@ -1768,7 +1769,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
 
         (*visited_l1_entries)++;
         if (status_cb) {
-            status_cb(bs, *visited_l1_entries, l1_entries);
+            status_cb(bs, *visited_l1_entries, l1_entries, cb_opaque);
         }
     }
 
@@ -1797,7 +1798,8 @@ fail:
  * qcow2 version which doesn't yet support metadata zero clusters.
  */
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
-                               BlockDriverAmendStatusCB *status_cb)
+                               BlockDriverAmendStatusCB *status_cb,
+                               void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t *l1_table = NULL;
@@ -1814,7 +1816,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
     ret = expand_zero_clusters_in_l1(bs, s->l1_table, s->l1_size,
                                      &visited_l1_entries, l1_entries,
-                                     status_cb);
+                                     status_cb, cb_opaque);
     if (ret < 0) {
         goto fail;
     }
@@ -1849,7 +1851,7 @@ int qcow2_expand_zero_clusters(BlockDriverState *bs,
 
         ret = expand_zero_clusters_in_l1(bs, l1_table, s->snapshots[i].l1_size,
                                          &visited_l1_entries, l1_entries,
-                                         status_cb);
+                                         status_cb, cb_opaque);
         if (ret < 0) {
             goto fail;
         }
diff --git a/block/qcow2.c b/block/qcow2.c
index 657d558..d084485 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2592,7 +2592,7 @@ static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf,
  * have to be removed.
  */
 static int qcow2_downgrade(BlockDriverState *bs, int target_version,
-                           BlockDriverAmendStatusCB *status_cb)
+                           BlockDriverAmendStatusCB *status_cb, void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     int current_version = s->qcow_version;
@@ -2641,7 +2641,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
     /* clearing autoclear features is trivial */
     s->autoclear_features = 0;
 
-    ret = qcow2_expand_zero_clusters(bs, status_cb);
+    ret = qcow2_expand_zero_clusters(bs, status_cb, cb_opaque);
     if (ret < 0) {
         return ret;
     }
@@ -2656,7 +2656,8 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
 }
 
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
-                               BlockDriverAmendStatusCB *status_cb)
+                               BlockDriverAmendStatusCB *status_cb,
+                               void *cb_opaque)
 {
     BDRVQcowState *s = bs->opaque;
     int old_version = s->qcow_version, new_version = old_version;
@@ -2737,7 +2738,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
                 return ret;
             }
         } else {
-            ret = qcow2_downgrade(bs, new_version, status_cb);
+            ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
             if (ret < 0) {
                 return ret;
             }
diff --git a/block/qcow2.h b/block/qcow2.h
index 1c63221..fe12c54 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -551,7 +551,8 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
 int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
 
 int qcow2_expand_zero_clusters(BlockDriverState *bs,
-                               BlockDriverAmendStatusCB *status_cb);
+                               BlockDriverAmendStatusCB *status_cb,
+                               void *cb_opaque);
 
 /* qcow2-snapshot.c functions */
 int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info);
diff --git a/include/block/block.h b/include/block/block.h
index 287dcab..5fd4c81 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -272,9 +272,9 @@ int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res, BdrvCheckMode fix);
  * block driver; total_work_size may change during the course of the amendment
  * operation */
 typedef void BlockDriverAmendStatusCB(BlockDriverState *bs, int64_t offset,
-                                      int64_t total_work_size);
+                                      int64_t total_work_size, void *opaque);
 int bdrv_amend_options(BlockDriverState *bs_new, QemuOpts *opts,
-                       BlockDriverAmendStatusCB *status_cb);
+                       BlockDriverAmendStatusCB *status_cb, void *cb_opaque);
 
 /* external snapshots */
 bool bdrv_recurse_is_first_non_filter(BlockDriverState *bs,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c34d610..e2167ab 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -234,7 +234,8 @@ struct BlockDriver {
         BdrvCheckMode fix);
 
     int (*bdrv_amend_options)(BlockDriverState *bs, QemuOpts *opts,
-                              BlockDriverAmendStatusCB *status_cb);
+                              BlockDriverAmendStatusCB *status_cb,
+                              void *cb_opaque);
 
     void (*bdrv_debug_event)(BlockDriverState *bs, BlkDebugEvent event);
 
diff --git a/qemu-img.c b/qemu-img.c
index a42335c..e0595fe 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2869,7 +2869,8 @@ out:
 }
 
 static void amend_status_cb(BlockDriverState *bs,
-                            int64_t offset, int64_t total_work_size)
+                            int64_t offset, int64_t total_work_size,
+                            void *opaque)
 {
     qemu_progress_print(100.f * offset / total_work_size, 0);
 }
@@ -2982,7 +2983,7 @@ static int img_amend(int argc, char **argv)
 
     /* In case the driver does not call amend_status_cb() */
     qemu_progress_print(0.f, 0);
-    ret = bdrv_amend_options(bs, opts, &amend_status_cb);
+    ret = bdrv_amend_options(bs, opts, &amend_status_cb, NULL);
     qemu_progress_print(100.f, 0);
     if (ret < 0) {
         error_report("Error while amending options: %s", strerror(-ret));
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 14/21] qcow2: Use error_report() in qcow2_amend_options()
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (12 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 13/21] block: Add opaque value to the amend CB Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c              | 14 ++++++--------
 tests/qemu-iotests/061.out | 14 +++++++-------
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d084485..e82120c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2686,11 +2686,11 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
             } else if (!strcmp(compat, "1.1")) {
                 new_version = 3;
             } else {
-                fprintf(stderr, "Unknown compatibility level %s.\n", compat);
+                error_report("Unknown compatibility level %s", compat);
                 return -EINVAL;
             }
         } else if (!strcmp(desc->name, "preallocation")) {
-            fprintf(stderr, "Cannot change preallocation mode.\n");
+            error_report("Cannot change preallocation mode");
             return -ENOTSUP;
         } else if (!strcmp(desc->name, "size")) {
             new_size = qemu_opt_get_size(opts, "size", 0);
@@ -2701,16 +2701,14 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         } else if (!strcmp(desc->name, "encryption")) {
             encrypt = qemu_opt_get_bool(opts, "encryption", s->crypt_method);
             if (encrypt != !!s->crypt_method) {
-                fprintf(stderr, "Changing the encryption flag is not "
-                        "supported.\n");
+                error_report("Changing the encryption flag is not supported");
                 return -ENOTSUP;
             }
         } else if (!strcmp(desc->name, "cluster_size")) {
             cluster_size = qemu_opt_get_size(opts, "cluster_size",
                                              cluster_size);
             if (cluster_size != s->cluster_size) {
-                fprintf(stderr, "Changing the cluster size is not "
-                        "supported.\n");
+                error_report("Changing the cluster size is not supported");
                 return -ENOTSUP;
             }
         } else if (!strcmp(desc->name, "lazy_refcounts")) {
@@ -2756,8 +2754,8 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
     if (s->use_lazy_refcounts != lazy_refcounts) {
         if (lazy_refcounts) {
             if (s->qcow_version < 3) {
-                fprintf(stderr, "Lazy refcounts only supported with compatibility "
-                        "level 1.1 and above (use compat=1.1 or greater)\n");
+                error_report("Lazy refcounts only supported with compatibility "
+                             "level 1.1 and above (use compat=1.1 or greater)");
                 return -EINVAL;
             }
             s->compatible_features |= QCOW2_COMPAT_LAZY_REFCOUNTS;
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 9045544..2fd92ca 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -281,19 +281,19 @@ No errors were found on the image.
 === Testing invalid configurations ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
-Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above (use compat=1.1 or greater)
 qemu-img: Error while amending options: Invalid argument
-Unknown compatibility level 0.42.
+qemu-img: Unknown compatibility level 0.42
 qemu-img: Error while amending options: Invalid argument
 qemu-img: Invalid parameter 'foo'
 qemu-img: Invalid options for file format 'qcow2'
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
-Changing the encryption flag is not supported.
+qemu-img: Changing the encryption flag is not supported
 qemu-img: Error while amending options: Operation not supported
-Cannot change preallocation mode.
+qemu-img: Cannot change preallocation mode
 qemu-img: Error while amending options: Operation not supported
 
 === Testing correct handling of unset value ===
@@ -301,7 +301,7 @@ qemu-img: Error while amending options: Operation not supported
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
 Should work:
 Should not work:
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported
 qemu-img: Error while amending options: Operation not supported
 
 === Testing zero expansion on inactive clusters ===
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 15/21] qcow2: Use abort() instead of assert(false)
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (13 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e82120c..423af48 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2718,9 +2718,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
             error_report("Cannot change refcount entry width");
             return -ENOTSUP;
         } else {
-            /* if this assertion fails, this probably means a new option was
+            /* if this point is reached, this probably means a new option was
              * added without having it covered here */
-            assert(false);
+            abort();
         }
 
         desc++;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 16/21] qcow2: Split upgrade/downgrade paths for amend
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (14 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 17/21] qcow2: Use intermediate helper CB " Max Reitz
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

If the image version should be upgraded, that is the first we should do;
if it should be downgraded, that is the last we should do. So split the
version change block into an upgrade part at the start and a downgrade
part at the end.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 423af48..d2553b9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2726,20 +2726,13 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         desc++;
     }
 
-    if (new_version != old_version) {
-        if (new_version > old_version) {
-            /* Upgrade */
-            s->qcow_version = new_version;
-            ret = qcow2_update_header(bs);
-            if (ret < 0) {
-                s->qcow_version = old_version;
-                return ret;
-            }
-        } else {
-            ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
-            if (ret < 0) {
-                return ret;
-            }
+    /* Upgrade first (some features may require compat=1.1) */
+    if (new_version > old_version) {
+        s->qcow_version = new_version;
+        ret = qcow2_update_header(bs);
+        if (ret < 0) {
+            s->qcow_version = old_version;
+            return ret;
         }
     }
 
@@ -2753,7 +2746,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
 
     if (s->use_lazy_refcounts != lazy_refcounts) {
         if (lazy_refcounts) {
-            if (s->qcow_version < 3) {
+            if (new_version < 3) {
                 error_report("Lazy refcounts only supported with compatibility "
                              "level 1.1 and above (use compat=1.1 or greater)");
                 return -EINVAL;
@@ -2789,6 +2782,14 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         }
     }
 
+    /* Downgrade last (so unsupported features can be removed before) */
+    if (new_version < old_version) {
+        ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
     return 0;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 17/21] qcow2: Use intermediate helper CB for amend
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (15 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment Max Reitz
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

If there is more than one time-consuming operation to be performed for
qcow2_amend_options(), we need an intermediate CB which coordinates the
progress of the individual operations and passes the result to the
original status callback.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d2553b9..0263019 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2655,6 +2655,75 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
     return 0;
 }
 
+typedef enum Qcow2AmendOperation {
+    /* This is the value Qcow2AmendHelperCBInfo::last_operation will be
+     * statically initialized to so that the helper CB can discern the first
+     * invocation from an operation change */
+    QCOW2_NO_OPERATION = 0,
+
+    QCOW2_DOWNGRADING,
+} Qcow2AmendOperation;
+
+typedef struct Qcow2AmendHelperCBInfo {
+    /* The code coordinating the amend operations should only modify
+     * these four fields; the rest will be managed by the CB */
+    BlockDriverAmendStatusCB *original_status_cb;
+    void *original_cb_opaque;
+
+    Qcow2AmendOperation current_operation;
+
+    /* Total number of operations to perform (only set once) */
+    int total_operations;
+
+    /* The following fields are managed by the CB */
+
+    /* Number of operations completed */
+    int operations_completed;
+
+    /* Cumulative offset of all completed operations */
+    int64_t offset_completed;
+
+    Qcow2AmendOperation last_operation;
+    int64_t last_work_size;
+} Qcow2AmendHelperCBInfo;
+
+static void qcow2_amend_helper_cb(BlockDriverState *bs,
+                                  int64_t operation_offset,
+                                  int64_t operation_work_size, void *opaque)
+{
+    Qcow2AmendHelperCBInfo *info = opaque;
+    int64_t current_work_size;
+    int64_t projected_work_size;
+
+    if (info->current_operation != info->last_operation) {
+        if (info->last_operation != QCOW2_NO_OPERATION) {
+            info->offset_completed += info->last_work_size;
+            info->operations_completed++;
+        }
+
+        info->last_operation = info->current_operation;
+    }
+
+    assert(info->total_operations > 0);
+    assert(info->operations_completed < info->total_operations);
+
+    info->last_work_size = operation_work_size;
+
+    current_work_size = info->offset_completed + operation_work_size;
+
+    /* current_work_size is the total work size for (operations_completed + 1)
+     * operations (which includes this one), so multiply it by the number of
+     * operations not covered and divide it by the number of operations
+     * covered to get a projection for the operations not covered */
+    projected_work_size = current_work_size * (info->total_operations -
+                                               info->operations_completed - 1)
+                                            / (info->operations_completed + 1);
+
+    info->original_status_cb(bs, info->offset_completed + operation_offset,
+                             current_work_size + projected_work_size,
+                             info->original_cb_opaque);
+}
+
 static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
                                BlockDriverAmendStatusCB *status_cb,
                                void *cb_opaque)
@@ -2669,6 +2738,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
     bool encrypt;
     int ret;
     QemuOptDesc *desc = opts->list->desc;
+    Qcow2AmendHelperCBInfo helper_cb_info;
 
     while (desc && desc->name) {
         if (!qemu_opt_find(opts, desc->name)) {
@@ -2726,6 +2796,12 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         desc++;
     }
 
+    helper_cb_info = (Qcow2AmendHelperCBInfo){
+        .original_status_cb = status_cb,
+        .original_cb_opaque = cb_opaque,
+        .total_operations = (new_version < old_version)
+    };
+
     /* Upgrade first (some features may require compat=1.1) */
     if (new_version > old_version) {
         s->qcow_version = new_version;
@@ -2784,7 +2860,9 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
 
     /* Downgrade last (so unsupported features can be removed before) */
     if (new_version < old_version) {
-        ret = qcow2_downgrade(bs, new_version, status_cb, cb_opaque);
+        helper_cb_info.current_operation = QCOW2_DOWNGRADING;
+        ret = qcow2_downgrade(bs, new_version, &qcow2_amend_helper_cb,
+                              &helper_cb_info);
         if (ret < 0) {
             return ret;
         }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (16 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 17/21] qcow2: Use intermediate helper CB " Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-18 17:55   ` Eric Blake
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 19/21] qcow2: Invoke refcount order amendment function Max Reitz
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 457 +++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h          |   4 +
 2 files changed, 461 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 2e13a9c..b59a028 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -2484,3 +2484,460 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
 
     return 0;
 }
+
+/* A pointer to a function of this type is given to walk_over_reftable(). That
+ * function will create refblocks and pass them to a RefblockFinishOp once they
+ * are completed (@refblock). @refblock_empty is set if the refblock is
+ * completely empty.
+ *
+ * Along with the refblock, a corresponding reftable entry is passed, in the
+ * reftable @reftable (which may be reallocated) at @reftable_index.
+ *
+ * @allocated should be set to true if a new cluster has been allocated.
+ */
+typedef int (RefblockFinishOp)(BlockDriverState *bs, uint64_t **reftable,
+                               uint64_t reftable_index, uint64_t *reftable_size,
+                               void *refblock, bool refblock_empty,
+                               bool *allocated, Error **errp);
+
+/**
+ * This "operation" for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+                          uint64_t reftable_index, uint64_t *reftable_size,
+                          void *refblock, bool refblock_empty, bool *allocated,
+                          Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+
+    if (!refblock_empty && reftable_index >= *reftable_size) {
+        uint64_t *new_reftable;
+        uint64_t new_reftable_size;
+
+        new_reftable_size = ROUND_UP(reftable_index + 1,
+                                     s->cluster_size / sizeof(uint64_t));
+        if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+            error_setg(errp,
+                       "This operation would make the refcount table grow "
+                       "beyond the maximum size supported by QEMU, aborting");
+            return -ENOTSUP;
+        }
+
+        new_reftable = g_try_realloc(*reftable, new_reftable_size *
+                                                sizeof(uint64_t));
+        if (!new_reftable) {
+            error_setg(errp, "Failed to increase reftable buffer size");
+            return -ENOMEM;
+        }
+
+        memset(new_reftable + *reftable_size, 0,
+               (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+        *reftable      = new_reftable;
+        *reftable_size = new_reftable_size;
+    }
+
+    if (!refblock_empty && !(*reftable)[reftable_index]) {
+        offset = qcow2_alloc_clusters(bs, s->cluster_size);
+        if (offset < 0) {
+            error_setg_errno(errp, -offset, "Failed to allocate refblock");
+            return offset;
+        }
+        (*reftable)[reftable_index] = offset;
+        *allocated = true;
+    }
+
+    return 0;
+}
+
+/**
+ * This "operation" for walk_over_reftable() writes the refblock to disk at the
+ * offset specified by the new reftable's entry. It does not modify the new
+ * reftable or change any refcounts.
+ */
+static int flush_refblock(BlockDriverState *bs, uint64_t **reftable,
+                          uint64_t reftable_index, uint64_t *reftable_size,
+                          void *refblock, bool refblock_empty, bool *allocated,
+                          Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t offset;
+    int ret;
+
+    if (reftable_index < *reftable_size && (*reftable)[reftable_index]) {
+        offset = (*reftable)[reftable_index];
+
+        ret = qcow2_pre_write_overlap_check(bs, 0, offset, s->cluster_size);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Overlap check failed");
+            return ret;
+        }
+
+        ret = bdrv_pwrite(bs->file, offset, refblock, s->cluster_size);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "Failed to write refblock");
+            return ret;
+        }
+    } else {
+        assert(refblock_empty);
+    }
+
+    return 0;
+}
+
+/**
+ * This function walks over the existing reftable and every referenced refblock;
+ * if @new_set_refcount is non-NULL, it is called for every refcount entry to
+ * create an equal new entry in the passed @new_refblock. Once that
+ * @new_refblock is completely filled, @operation will be called.
+ *
+ * @status_cb and @cb_opaque are used for the amend operation's status callback.
+ * @index is the index of the walk_over_reftable() calls and @total is the total
+ * number of walk_over_reftable() calls per amend operation. Both are used for
+ * calculating the parameters for the status callback.
+ *
+ * @allocated is set to true if a new cluster has been allocated.
+ */
+static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
+                              uint64_t *new_reftable_index,
+                              uint64_t *new_reftable_size,
+                              void *new_refblock, int new_refblock_size,
+                              int new_refcount_bits,
+                              RefblockFinishOp *operation, bool *allocated,
+                              Qcow2SetRefcountFunc *new_set_refcount,
+                              BlockDriverAmendStatusCB *status_cb,
+                              void *cb_opaque, int index, int total,
+                              Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t reftable_index;
+    bool new_refblock_empty = true;
+    int refblock_index;
+    int new_refblock_index = 0;
+    int ret;
+
+    for (reftable_index = 0; reftable_index < s->refcount_table_size;
+         reftable_index++)
+    {
+        uint64_t refblock_offset = s->refcount_table[reftable_index]
+                                 & REFT_OFFSET_MASK;
+
+        status_cb(bs, (uint64_t)index * s->refcount_table_size + reftable_index,
+                  (uint64_t)total * s->refcount_table_size, cb_opaque);
+
+        if (refblock_offset) {
+            void *refblock;
+
+            if (offset_into_cluster(s, refblock_offset)) {
+                qcow2_signal_corruption(bs, true, -1, -1, "Refblock offset %#"
+                                        PRIx64 " unaligned (reftable index: %#"
+                                        PRIx64 ")", refblock_offset,
+                                        reftable_index);
+                error_setg(errp,
+                           "Image is corrupt (unaligned refblock offset)");
+                return -EIO;
+            }
+
+            ret = qcow2_cache_get(bs, s->refcount_block_cache, refblock_offset,
+                                  &refblock);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to retrieve refblock");
+                return ret;
+            }
+
+            for (refblock_index = 0; refblock_index < s->refcount_block_size;
+                 refblock_index++)
+            {
+                uint64_t refcount;
+
+                if (new_refblock_index >= new_refblock_size) {
+                    /* new_refblock is now complete */
+                    ret = operation(bs, new_reftable, *new_reftable_index,
+                                    new_reftable_size, new_refblock,
+                                    new_refblock_empty, allocated, errp);
+                    if (ret < 0) {
+                        qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+                        return ret;
+                    }
+
+                    (*new_reftable_index)++;
+                    new_refblock_index = 0;
+                    new_refblock_empty = true;
+                }
+
+                refcount = s->get_refcount(refblock, refblock_index);
+                if (new_refcount_bits < 64 && refcount >> new_refcount_bits) {
+                    uint64_t offset;
+
+                    qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+
+                    offset = ((reftable_index << s->refcount_block_bits)
+                              + refblock_index) << s->cluster_bits;
+
+                    error_setg(errp, "Cannot decrease refcount entry width to "
+                               "%i bits: Cluster at offset %#" PRIx64 " has a "
+                               "refcount of %" PRIu64, new_refcount_bits,
+                               offset, refcount);
+                    return -EINVAL;
+                }
+
+                if (new_set_refcount) {
+                    new_set_refcount(new_refblock, new_refblock_index++, refcount);
+                } else {
+                    new_refblock_index++;
+                }
+                new_refblock_empty = new_refblock_empty && refcount == 0;
+            }
+
+            ret = qcow2_cache_put(bs, s->refcount_block_cache, &refblock);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "Failed to put refblock back into "
+                                 "the cache");
+                return ret;
+            }
+        } else {
+            /* No refblock means every refcount is 0 */
+            for (refblock_index = 0; refblock_index < s->refcount_block_size;
+                 refblock_index++)
+            {
+                if (new_refblock_index >= new_refblock_size) {
+                    /* new_refblock is now complete */
+                    ret = operation(bs, new_reftable, *new_reftable_index,
+                                    new_reftable_size, new_refblock,
+                                    new_refblock_empty, allocated, errp);
+                    if (ret < 0) {
+                        return ret;
+                    }
+
+                    (*new_reftable_index)++;
+                    new_refblock_index = 0;
+                    new_refblock_empty = true;
+                }
+
+                if (new_set_refcount) {
+                    new_set_refcount(new_refblock, new_refblock_index++, 0);
+                } else {
+                    new_refblock_index++;
+                }
+            }
+        }
+    }
+
+    if (new_refblock_index > 0) {
+        /* Complete the potentially existing partially filled final refblock */
+        if (new_set_refcount) {
+            for (; new_refblock_index < new_refblock_size;
+                 new_refblock_index++)
+            {
+                new_set_refcount(new_refblock, new_refblock_index, 0);
+            }
+        }
+
+        ret = operation(bs, new_reftable, *new_reftable_index,
+                        new_reftable_size, new_refblock, new_refblock_empty,
+                        allocated, errp);
+        if (ret < 0) {
+            return ret;
+        }
+
+        (*new_reftable_index)++;
+    }
+
+    status_cb(bs, (uint64_t)(index + 1) * s->refcount_table_size,
+              (uint64_t)total * s->refcount_table_size, cb_opaque);
+
+    return 0;
+}
+
+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
+                                BlockDriverAmendStatusCB *status_cb,
+                                void *cb_opaque, Error **errp)
+{
+    BDRVQcowState *s = bs->opaque;
+    Qcow2GetRefcountFunc *new_get_refcount;
+    Qcow2SetRefcountFunc *new_set_refcount;
+    void *new_refblock = qemu_blockalign(bs->file, s->cluster_size);
+    uint64_t *new_reftable = NULL, new_reftable_size = 0;
+    uint64_t *old_reftable, old_reftable_size, old_reftable_offset;
+    uint64_t new_reftable_index = 0;
+    uint64_t i;
+    int64_t new_reftable_offset = 0, allocated_reftable_size = 0;
+    int new_refblock_size, new_refcount_bits = 1 << refcount_order;
+    int old_refcount_order;
+    int walk_index = 0;
+    int ret;
+    bool new_allocation;
+
+    assert(s->qcow_version >= 3);
+    assert(refcount_order >= 0 && refcount_order <= 6);
+
+    /* see qcow2_open() */
+    new_refblock_size = 1 << (s->cluster_bits - (refcount_order - 3));
+
+    get_refcount_functions(refcount_order,
+                           &new_get_refcount, &new_set_refcount);
+
+
+    do {
+        int total_walks;
+
+        new_allocation = false;
+
+        /* At least we have to do this walk and the one which writes the
+         * refblocks; also, at least we have to do this loop here at least
+         * twice (normally), first to do the allocations, and second to
+         * determine that everything is correctly allocated, this then makes
+         * three walks in total */
+        total_walks = MIN(walk_index + 2, 3);
+
+        /* First, allocate the structures so they are present in the refcount
+         * structures */
+        ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
+                                 &new_reftable_size, NULL, new_refblock_size,
+                                 new_refcount_bits, &alloc_refblock,
+                                 &new_allocation, NULL, status_cb, cb_opaque,
+                                 walk_index++, total_walks, errp);
+        if (ret < 0) {
+            goto done;
+        }
+
+        new_reftable_index = 0;
+
+        if (new_allocation) {
+            if (new_reftable_offset) {
+                qcow2_free_clusters(bs, new_reftable_offset,
+                                    allocated_reftable_size * sizeof(uint64_t),
+                                    QCOW2_DISCARD_NEVER);
+            }
+
+            new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
+                                                           sizeof(uint64_t));
+            if (new_reftable_offset < 0) {
+                error_setg_errno(errp, -new_reftable_offset,
+                                 "Failed to allocate the new reftable");
+                ret = new_reftable_offset;
+                goto done;
+            }
+            allocated_reftable_size = new_reftable_size;
+
+            new_allocation = true;
+        }
+    } while (new_allocation);
+
+    /* Second, write the new refblocks */
+    new_allocation = false;
+    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
+                             &new_reftable_size, new_refblock,
+                             new_refblock_size, new_refcount_bits,
+                             &flush_refblock, &new_allocation, new_set_refcount,
+                             status_cb, cb_opaque, walk_index, walk_index + 1,
+                             errp);
+    if (ret < 0) {
+        goto done;
+    }
+    assert(!new_allocation);
+
+
+    /* Write the new reftable */
+    ret = qcow2_pre_write_overlap_check(bs, 0, new_reftable_offset,
+                                        new_reftable_size * sizeof(uint64_t));
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Overlap check failed");
+        goto done;
+    }
+
+    for (i = 0; i < new_reftable_size; i++) {
+        cpu_to_be64s(&new_reftable[i]);
+    }
+
+    ret = bdrv_pwrite(bs->file, new_reftable_offset, new_reftable,
+                      new_reftable_size * sizeof(uint64_t));
+
+    for (i = 0; i < new_reftable_size; i++) {
+        be64_to_cpus(&new_reftable[i]);
+    }
+
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to write the new reftable");
+        goto done;
+    }
+
+
+    /* Empty the refcount cache */
+    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to flush the refblock cache");
+        goto done;
+    }
+
+    /* Update the image header to point to the new reftable; this only updates
+     * the fields which are relevant to qcow2_update_header(); other fields
+     * such as s->refcount_table or s->refcount_bits stay stale for now
+     * (because we have to restore everything if qcow2_update_header() fails) */
+    old_refcount_order  = s->refcount_order;
+    old_reftable_size   = s->refcount_table_size;
+    old_reftable_offset = s->refcount_table_offset;
+
+    s->refcount_order        = refcount_order;
+    s->refcount_table_size   = new_reftable_size;
+    s->refcount_table_offset = new_reftable_offset;
+
+    ret = qcow2_update_header(bs);
+    if (ret < 0) {
+        s->refcount_order        = old_refcount_order;
+        s->refcount_table_size   = old_reftable_size;
+        s->refcount_table_offset = old_reftable_offset;
+        error_setg_errno(errp, -ret, "Failed to update the qcow2 header");
+        goto done;
+    }
+
+    /* Now update the rest of the in-memory information */
+    old_reftable = s->refcount_table;
+    s->refcount_table = new_reftable;
+
+    s->refcount_bits = 1 << refcount_order;
+    if (refcount_order < 6) {
+        s->refcount_max = (UINT64_C(1) << s->refcount_bits) - 1;
+    } else {
+        s->refcount_max = INT64_MAX;
+    }
+
+    s->refcount_block_bits = s->cluster_bits - (refcount_order - 3);
+    s->refcount_block_size = 1 << s->refcount_block_bits;
+
+    s->get_refcount = new_get_refcount;
+    s->set_refcount = new_set_refcount;
+
+    /* For cleaning up all old refblocks and the old reftable below the "done"
+     * label */
+    new_reftable        = old_reftable;
+    new_reftable_size   = old_reftable_size;
+    new_reftable_offset = old_reftable_offset;
+
+done:
+    if (new_reftable) {
+        /* On success, new_reftable actually points to the old reftable (and
+         * new_reftable_size is the old reftable's size); but that is just
+         * fine */
+        for (i = 0; i < new_reftable_size; i++) {
+            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
+            if (offset) {
+                qcow2_free_clusters(bs, offset, s->cluster_size,
+                                    QCOW2_DISCARD_NEVER);
+            }
+        }
+        g_free(new_reftable);
+
+        if (new_reftable_offset > 0) {
+            qcow2_free_clusters(bs, new_reftable_offset,
+                                new_reftable_size * sizeof(uint64_t),
+                                QCOW2_DISCARD_NEVER);
+        }
+    }
+
+    qemu_vfree(new_refblock);
+    return ret;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index fe12c54..5b96519 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -526,6 +526,10 @@ int qcow2_check_metadata_overlap(BlockDriverState *bs, int ign, int64_t offset,
 int qcow2_pre_write_overlap_check(BlockDriverState *bs, int ign, int64_t offset,
                                   int64_t size);
 
+int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
+                                BlockDriverAmendStatusCB *status_cb,
+                                void *cb_opaque, Error **errp);
+
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
                         bool exact_size);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 19/21] qcow2: Invoke refcount order amendment function
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (17 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 20/21] qcow2: Point to amend function in check Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths Max Reitz
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Make use of qcow2_change_refcount_order() to support changing the
refcount order with qemu-img amend.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c | 44 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0263019..469650b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2607,13 +2607,7 @@ static int qcow2_downgrade(BlockDriverState *bs, int target_version,
     }
 
     if (s->refcount_order != 4) {
-        /* we would have to convert the image to a refcount_order == 4 image
-         * here; however, since qemu (at the time of writing this) does not
-         * support anything different than 4 anyway, there is no point in doing
-         * so right now; however, we should error out (if qemu supports this in
-         * the future and this code has not been adapted) */
-        error_report("qcow2_downgrade: Image refcount orders other than 4 are "
-                     "currently not supported.");
+        error_report("compat=0.10 requires refcount_width=16");
         return -ENOTSUP;
     }
 
@@ -2661,6 +2655,7 @@ typedef enum Qcow2AmendOperation {
      * invocation from an operation change */
     QCOW2_NO_OPERATION = 0,
 
+    QCOW2_CHANGING_REFCOUNT_ORDER,
     QCOW2_DOWNGRADING,
 } Qcow2AmendOperation;
 
@@ -2736,6 +2731,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
     const char *compat = NULL;
     uint64_t cluster_size = s->cluster_size;
     bool encrypt;
+    int refcount_width = s->refcount_bits;
     int ret;
     QemuOptDesc *desc = opts->list->desc;
     Qcow2AmendHelperCBInfo helper_cb_info;
@@ -2785,8 +2781,16 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
             lazy_refcounts = qemu_opt_get_bool(opts, "lazy_refcounts",
                                                lazy_refcounts);
         } else if (!strcmp(desc->name, "refcount_width")) {
-            error_report("Cannot change refcount entry width");
-            return -ENOTSUP;
+            refcount_width = qemu_opt_get_number(opts, "refcount_width",
+                                                 refcount_width);
+
+            if (refcount_width <= 0 || refcount_width > 64 ||
+                !is_power_of_2(refcount_width))
+            {
+                error_report("Refcount width must be a power of two and may "
+                             "not exceed 64 bits");
+                return -EINVAL;
+            }
         } else {
             /* if this point is reached, this probably means a new option was
              * added without having it covered here */
@@ -2800,6 +2804,7 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         .original_status_cb = status_cb,
         .original_cb_opaque = cb_opaque,
         .total_operations = (new_version < old_version)
+                          + (s->refcount_bits != refcount_width)
     };
 
     /* Upgrade first (some features may require compat=1.1) */
@@ -2812,6 +2817,27 @@ static int qcow2_amend_options(BlockDriverState *bs, QemuOpts *opts,
         }
     }
 
+    if (s->refcount_bits != refcount_width) {
+        int refcount_order = ffs(refcount_width) - 1;
+        Error *local_error = NULL;
+
+        if (new_version < 3 && refcount_width != 16) {
+            error_report("Different refcount widths than 16 bits require "
+                         "compatibility level 1.1 or above (use compat=1.1 or "
+                         "greater)");
+            return -EINVAL;
+        }
+
+        helper_cb_info.current_operation = QCOW2_CHANGING_REFCOUNT_ORDER;
+        ret = qcow2_change_refcount_order(bs, refcount_order,
+                                          &qcow2_amend_helper_cb,
+                                          &helper_cb_info, &local_error);
+        if (ret < 0) {
+            qerror_report_err(local_error);
+            return ret;
+        }
+    }
+
     if (backing_file || backing_format) {
         ret = qcow2_change_backing_file(bs, backing_file ?: bs->backing_file,
                                         backing_format ?: bs->backing_format);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 20/21] qcow2: Point to amend function in check
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (18 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 19/21] qcow2: Invoke refcount order amendment function Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths Max Reitz
  20 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

If a reference count is not representable with the current refcount
order, the image check should point to qemu-img amend for increasing the
refcount order. However, qemu-img amend needs write access to the image
which cannot be provided if the image is marked corrupt; and the image
check will not mark the image consistent unless everything actually is
consistent.

Therefore, if an image is marked corrupt and the image check encounters
a reference count overflow, it cannot be fixed by using qemu-img amend
to increase the refcount order. Instead, one has to use qemu-img convert
to create a completely new copy of the image in this case.

Alternatively, we may want to give the user a way of manually removing
the corrupt flag, maybe through qemu-img amend, but this is not part of
this patch.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-refcount.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index b59a028..e9647ce 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1368,6 +1368,9 @@ static int inc_refcounts(BlockDriverState *bs,
         if (refcount == s->refcount_max) {
             fprintf(stderr, "ERROR: overflow cluster offset=0x%" PRIx64
                     "\n", cluster_offset);
+            fprintf(stderr, "Use qemu-img amend to increase the refcount entry "
+                    "width or qemu-img convert to create a clean copy if the "
+                    "image cannot be opened for writing\n");
             res->corruptions++;
             continue;
         }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
                   ` (19 preceding siblings ...)
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 20/21] qcow2: Point to amend function in check Max Reitz
@ 2014-11-14 13:06 ` Max Reitz
  2014-11-15 14:50   ` Eric Blake
  20 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-14 13:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi, Max Reitz

Add a test for conversion between different refcount widths and errors
specific to certain widths (i.e. snapshots with refcount_width=1).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/112     | 252 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/112.out | 131 +++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 384 insertions(+)
 create mode 100755 tests/qemu-iotests/112
 create mode 100644 tests/qemu-iotests/112.out

diff --git a/tests/qemu-iotests/112 b/tests/qemu-iotests/112
new file mode 100755
index 0000000..e824d8a
--- /dev/null
+++ b/tests/qemu-iotests/112
@@ -0,0 +1,252 @@
+#!/bin/bash
+#
+# Test cases for different refcount_widths
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=mreitz@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qcow2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+# This test will set refcount_width on its own which would conflict with the
+# manual setting; compat will be overridden as well
+_unsupported_imgopts refcount_width 'compat=0.10'
+
+function print_refcount_width()
+{
+    $QEMU_IMG info "$TEST_IMG" | sed -n '/refcount width:/ s/^ *//p'
+}
+
+echo
+echo '=== refcount_width limits ==='
+echo
+
+# Must be positive (non-zero)
+IMGOPTS="$IMGOPTS,refcount_width=0" _make_test_img 64M
+# Must be positive (non-negative)
+IMGOPTS="$IMGOPTS,refcount_width=-1" _make_test_img 64M
+# May not exceed 64
+IMGOPTS="$IMGOPTS,refcount_width=128" _make_test_img 64M
+# Must be a power of two
+IMGOPTS="$IMGOPTS,refcount_width=42" _make_test_img 64M
+
+# 1 is the minimum
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# 64 is the maximum
+IMGOPTS="$IMGOPTS,refcount_width=64" _make_test_img 64M
+print_refcount_width
+
+# 16 is the default
+_make_test_img 64M
+print_refcount_width
+
+echo
+echo '=== refcount_width and compat=0.10 ==='
+echo
+
+# Should work
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=16" _make_test_img 64M
+print_refcount_width
+
+# Should not work
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=1" _make_test_img 64M
+IMGOPTS="$IMGOPTS,compat=0.10,refcount_width=64" _make_test_img 64M
+
+
+echo
+echo '=== Snapshot limit on refcount_width=1 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should fail for now; in the future, this might be supported by automatically
+# copying all clusters with overflowing refcount
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+
+# The new L1 table could/should be leaked
+_check_test_img
+
+echo
+echo '=== Snapshot limit on refcount_width=2 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=2" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should succeed
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+$QEMU_IMG snapshot -c bar "$TEST_IMG"
+# Should fail (4th reference)
+$QEMU_IMG snapshot -c baz "$TEST_IMG"
+
+# The new L1 table could/should be leaked
+_check_test_img
+
+echo
+echo '=== Compressed clusters with refcount_width=1 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# Both should fit into a single host cluster; instead of failing to increase the
+# refcount of that cluster, qemu should just allocate a new cluster and make
+# this operation succeed
+$QEMU_IO -c 'write -P 0 -c  0  64k' \
+         -c 'write -P 1 -c 64k 64k' \
+         "$TEST_IMG" | _filter_qemu_io
+
+_check_test_img
+
+echo
+echo '=== Amend from refcount_width=16 to refcount_width=1 ==='
+echo
+
+_make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 16M 32M' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG amend -o refcount_width=1 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Amend from refcount_width=1 to refcount_width=64 ==='
+echo
+
+$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Amend to compat=0.10 ==='
+echo
+
+# Should not work because refcount_width needs to be 16 for compat=0.10
+$QEMU_IMG amend -o compat=0.10 "$TEST_IMG"
+print_refcount_width
+# Should work
+$QEMU_IMG amend -o compat=0.10,refcount_width=16 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+# Get back to compat=1.1 and refcount_width=16
+$QEMU_IMG amend -o compat=1.1 "$TEST_IMG"
+print_refcount_width
+# Should not work
+$QEMU_IMG amend -o refcount_width=32,compat=0.10 "$TEST_IMG"
+print_refcount_width
+
+echo
+echo '=== Amend with snapshot ==='
+echo
+
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+# Just to have different refcounts across the image
+$QEMU_IO -c 'write 0 16M' "$TEST_IMG" | _filter_qemu_io
+
+# Should not work (may work in the future by first decreasing all refcounts so
+# they fit into the target range by copying them)
+$QEMU_IMG amend -o refcount_width=1 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+# Should work
+$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
+_check_test_img
+print_refcount_width
+
+echo
+echo '=== Testing too many references for check ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# This cluster should be created at 0x50000
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+# Now make the second L2 entry (the L2 table should be at 0x40000) point to that
+# cluster, so we have two references
+poke_file "$TEST_IMG" $((0x40008)) "\x80\x00\x00\x00\x00\x05\x00\x00"
+
+# This should say "please use amend"
+_check_test_img -r all
+
+# So we do that
+$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
+print_refcount_width
+
+# And try again
+_check_test_img -r all
+
+echo
+echo '=== Multiple walks necessary during amend ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1,cluster_size=512" _make_test_img 64k
+
+# Cluster 0 is the image header, clusters 1 to 4 are used by the L1 table, a
+# single L2 table, the reftable and a single refblock. This creates 58 data
+# clusters (actually, the L2 table is created here, too), so in total there are
+# then 63 used clusters in the image. With a refcount width of 64, one refblock
+# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), so this will
+# make the first target refblock have exactly one free entry.
+$QEMU_IO -c "write 0 $((58 * 512))" "$TEST_IMG" | _filter_qemu_io
+
+# Now change the refcount width; since the first target refblock has exactly one
+# free entry, that entry will be used to store its own reference. No other
+# refblocks are needed, so then the new reftable will be allocated; since the
+# first target refblock is completely filled up, this will require a new
+# refblock which is why the refcount width changing function will need to run
+# through everything one more time until the allocations are stable.
+$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
+print_refcount_width
+
+_check_test_img
+
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/112.out b/tests/qemu-iotests/112.out
new file mode 100644
index 0000000..907a05e
--- /dev/null
+++ b/tests/qemu-iotests/112.out
@@ -0,0 +1,131 @@
+QA output created by 112
+
+=== refcount_width limits ===
+
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 refcount_width=-1
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: TEST_DIR/t.IMGFMT: Refcount width must be a power of two and may not exceed 64 bits
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 64
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 16
+
+=== refcount_width and compat=0.10 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 16
+qemu-img: TEST_DIR/t.IMGFMT: Different refcount widths than 16 bits require compatibility level 1.1 or above (use compat=1.1 or greater)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: TEST_DIR/t.IMGFMT: Different refcount widths than 16 bits require compatibility level 1.1 or above (use compat=1.1 or greater)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+=== Snapshot limit on refcount_width=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Could not create snapshot 'foo': -22 (Invalid argument)
+Leaked cluster 6 refcount=1 reference=0
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+
+=== Snapshot limit on refcount_width=2 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 2
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Could not create snapshot 'baz': -22 (Invalid argument)
+Leaked cluster 7 refcount=1 reference=0
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+
+=== Compressed clusters with refcount_width=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+
+=== Amend from refcount_width=16 to refcount_width=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 16
+wrote 33554432/33554432 bytes at offset 16777216
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+refcount width: 1
+
+=== Amend from refcount_width=1 to refcount_width=64 ===
+
+No errors were found on the image.
+refcount width: 64
+
+=== Amend to compat=0.10 ===
+
+qemu-img: compat=0.10 requires refcount_width=16
+qemu-img: Error while amending options: Operation not supported
+refcount width: 64
+No errors were found on the image.
+refcount width: 16
+refcount width: 16
+qemu-img: Different refcount widths than 16 bits require compatibility level 1.1 or above (use compat=1.1 or greater)
+qemu-img: Error while amending options: Invalid argument
+refcount width: 16
+
+=== Amend with snapshot ===
+
+wrote 16777216/16777216 bytes at offset 0
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Cannot decrease refcount entry width to 1 bits: Cluster at offset 0x50000 has a refcount of 2
+qemu-img: Error while amending options: Invalid argument
+No errors were found on the image.
+refcount width: 16
+No errors were found on the image.
+refcount width: 2
+
+=== Testing too many references for check ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR: overflow cluster offset=0x50000
+Use qemu-img amend to increase the refcount entry width or qemu-img convert to create a clean copy if the image cannot be opened for writing
+
+1 errors were found on the image.
+Data may be corrupted, or further writes to the image may corrupt it.
+refcount width: 2
+ERROR cluster 5 refcount=1 reference=2
+Repairing cluster 5 refcount=1 reference=2
+Repairing OFLAG_COPIED data cluster: l2_entry=8000000000050000 refcount=2
+Repairing OFLAG_COPIED data cluster: l2_entry=8000000000050000 refcount=2
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    3 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+=== Multiple walks necessary during amend ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
+wrote 29696/29696 bytes at offset 0
+29 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+refcount width: 64
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 7dfe469..593f3dd 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -112,3 +112,4 @@
 107 rw auto quick
 108 rw auto quick
 111 rw auto quick
+112 rw auto
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths Max Reitz
@ 2014-11-15 14:50   ` Eric Blake
  2014-11-17  8:34     ` Max Reitz
  2014-11-17 12:06     ` Max Reitz
  0 siblings, 2 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-15 14:50 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3460 bytes --]

On 11/14/2014 06:06 AM, Max Reitz wrote:
> Add a test for conversion between different refcount widths and errors
> specific to certain widths (i.e. snapshots with refcount_width=1).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/112     | 252 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/112.out | 131 +++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 384 insertions(+)
>  create mode 100755 tests/qemu-iotests/112
>  create mode 100644 tests/qemu-iotests/112.out
> 
> +echo
> +echo '=== Testing too many references for check ==='
> +echo
> +
> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
> +print_refcount_width
> +
> +# This cluster should be created at 0x50000
> +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
> +# Now make the second L2 entry (the L2 table should be at 0x40000) point to that
> +# cluster, so we have two references
> +poke_file "$TEST_IMG" $((0x40008)) "\x80\x00\x00\x00\x00\x05\x00\x00"
> +
> +# This should say "please use amend"
> +_check_test_img -r all
> +
> +# So we do that
> +$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
> +print_refcount_width
> +
> +# And try again
> +_check_test_img -r all

I think this section also deserves a test that fuzzes an image with
width=64 to intentionally set the most significant bit of one of the
refcounts, and make sure that we gracefully diagnose it as invalid.

> +
> +echo
> +echo '=== Multiple walks necessary during amend ==='
> +echo
> +
> +IMGOPTS="$IMGOPTS,refcount_width=1,cluster_size=512" _make_test_img 64k
> +
> +# Cluster 0 is the image header, clusters 1 to 4 are used by the L1 table, a
> +# single L2 table, the reftable and a single refblock. This creates 58 data
> +# clusters (actually, the L2 table is created here, too), so in total there are
> +# then 63 used clusters in the image. With a refcount width of 64, one refblock
> +# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), so this will
> +# make the first target refblock have exactly one free entry.
> +$QEMU_IO -c "write 0 $((58 * 512))" "$TEST_IMG" | _filter_qemu_io
> +
> +# Now change the refcount width; since the first target refblock has exactly one
> +# free entry, that entry will be used to store its own reference. No other
> +# refblocks are needed, so then the new reftable will be allocated; since the
> +# first target refblock is completely filled up, this will require a new
> +# refblock which is why the refcount width changing function will need to run
> +# through everything one more time until the allocations are stable.
> +$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
> +print_refcount_width

Umm, that sounds backwards from what you document.  It's a good test of
the _new_ reftable needing a second round of allocations.  So keep it
with corrected comments.  But I think you _intended_ to write a test
that starts with a refcount_width=64 image and resize to a
refcount_width=1, where the _old_ reftable then suffers a reallocation
as part of allocating refblocks for the new table.  It may even help if
you add a tracepoint for every iteration through the walk function
callback, to prove we are indeed executing it 3 times instead of the
usual 2, for these test cases.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
@ 2014-11-15 16:00   ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-15 16:00 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1124 bytes --]

On 11/14/2014 06:05 AM, Max Reitz wrote:
> Add the bit width of every refcount entry to the format-specific
> information.
> 
> In contrast to lazy_refcounts and the corrupt flag, this should be
> always emitted, even for compat=0.10 although it does not support any
> refcount width other than 16 bits. This is because if a boolean is
> optional, one normally assumes it to be false when omitted; but if an
> integer is not specified, it is rather difficult to guess its value.
> 
> This new field breaks some test outputs, fix them.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c              |  4 +++-
>  qapi/block-core.json       |  5 ++++-
>  tests/qemu-iotests/060.out |  1 +
>  tests/qemu-iotests/065     | 23 +++++++++++++++--------
>  tests/qemu-iotests/067.out |  5 +++++
>  tests/qemu-iotests/082.out |  7 +++++++
>  tests/qemu-iotests/089.out |  2 ++
>  7 files changed, 37 insertions(+), 10 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation
  2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation Max Reitz
@ 2014-11-15 16:50   ` Eric Blake
  2014-11-17  8:37     ` Max Reitz
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2014-11-15 16:50 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3398 bytes --]

On 11/14/2014 06:05 AM, Max Reitz wrote:
> Add a helper function for reallocating a refcount array, independently

s/independently/independent/

> of the refcount order. The newly allocated space is zeroed and the
> function handles failed reallocations gracefully.

This patch is doing two things: it is refactoring things into a nice
helper function (mentioned), AND it is adding a guarantee that you now
always allocate a table on cluster boundaries, even when you aren't
using the full table (hinted at elsewhere in the series, but noticeably
absent here).  I think you want to add more comments to the commit
message making that more obvious, since it looks like you rely on that
guarantee later.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 121 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 72 insertions(+), 49 deletions(-)
> 

> +
> +static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
> +                                  int64_t *size, int64_t new_size)

I think this function deserves a comment stating that *array is actually
allocated to full cluster size with a 0 tail, so that it can be written
straight to disk.

> +{
> +    /* Round to clusters so the array can be directly written to disk */
> +    size_t old_byte_size = ROUND_UP(refcount_array_byte_size(s, *size),
> +                                    s->cluster_size);
> +    size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
> +                                    s->cluster_size);
> +    uint16_t *new_ptr;

Can old_byte_size ever equal new_byte_size?  Or are we guaranteed that
this will only be called when we really need to add another cluster to
the reftable?

[reading further]

Yes, it looks like *size and new_size are not necessarily
cluster-aligned, so as an example, it is very likely that we might call
realloc_refcount_array with the existing size of 20 and a new size of
21, both of which fit within the same byte size when rounded up to
cluster boundary.  But that means that the realloc is a no-op in that
case; might it be worth special-casing rather than wasting time on the
g_try_realloc and no-op memset?  [at least the code works correctly even
without a special case shortcut]

> +
> +    new_ptr = g_try_realloc(*array, new_byte_size);
> +    if (new_byte_size && !new_ptr) {
> +        return -ENOMEM;
> +    }

Is it worth asserting that new_byte_size is non-zero?  Why would anyone
ever call this to resize down to 0?  (But I can see where you DO call it
with old_byte_size of zero, when initializing data structures and using
this function for the first allocation.)

> +
> +    if (new_ptr) {

If we assert that new_byte_size is non-zero, then at this point, new_ptr
is non-NULL and this condition is pointless.

> +        memset((void *)((uintptr_t)new_ptr + old_byte_size), 0,
> +               new_byte_size - old_byte_size);
> +    }
> +
> +    *array = new_ptr;
> +    *size  = new_size;
> +
> +    return 0;
> +}
>  

Code looks correct as written, whether or not you also add more
comments, asserts, and/or shortcuts for no-op situations.  So:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification Max Reitz
@ 2014-11-15 17:02   ` Eric Blake
  2014-11-17  8:42     ` Max Reitz
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2014-11-15 17:02 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2808 bytes --]

On 11/14/2014 06:06 AM, Max Reitz wrote:
> Since refcounts do not always have to be a uint16_t, all refcount blocks
> and arrays in memory should not have a specific type (thus they become
> pointers to void) and for accessing them, two helper functions are used
> (a getter and a setter). Those functions are called indirectly through
> function pointers in the BDRVQcowState so they may later be exchanged
> for different refcount orders.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 128 ++++++++++++++++++++++++++++++-------------------
>  block/qcow2.h          |   8 ++++
>  2 files changed, 87 insertions(+), 49 deletions(-)
> 

> @@ -1216,7 +1249,7 @@ enum {
>   * error occurred.
>   */
>  static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
> -    uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
> +    void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>      int flags)
>  {

Might be worth fixing the indentation here in addition to all the other
places you adjusted.  But that's minor.

> @@ -1933,17 +1967,13 @@ write_refblocks:
>              goto fail;
>          }
>  
> -        on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size);
> -        for (i = 0; i < s->refcount_block_size &&
> -                    refblock_start + i < *nb_clusters; i++)
> -        {
> -            on_disk_refblock[i] =
> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
> -        }
> +        /* The size of *refcount_table is always cluster-aligned, therefore the
> +         * write operation will not overflow */
> +        on_disk_refblock = (void *)((uintptr_t)*refcount_table +
> +                                    (refblock_index << s->refcount_block_bits));

Here is where you are relying on the guarantee that you added in 6/21,
which is why I ask for that one to mention it.

Nice reduction of a bounce buffer, by the way :)  Worth mentioning in
the commit message as an intentional part of this commit?

> @@ -2087,7 +2117,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>          /* Because the old reftable has been exchanged for a new one the
>           * references have to be recalculated */
>          rebuild = false;
> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);

Phew; we're safe that this won't overflow; and good that you do the *
first (if you did the /8 first, it would fail for sub-byte refcounts).

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers for refcount modification
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers " Max Reitz
@ 2014-11-15 17:08   ` Eric Blake
  2014-11-17  8:44     ` Max Reitz
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2014-11-15 17:08 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]

On 11/14/2014 06:06 AM, Max Reitz wrote:
> Add helper functions for getting and setting refcounts in a refcount
> array for any possible refcount order, and choose the correct one during
> refcount initialization.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 146 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 144 insertions(+), 2 deletions(-)
> 

> 
> +static void get_refcount_functions(int refcount_order,
> +                                   Qcow2GetRefcountFunc **get,
> +                                   Qcow2SetRefcountFunc **set)
> +{
> +    switch (refcount_order) {
> +        case 0:
> +            *get = &get_refcount_ro0;
> +            *set = &set_refcount_ro0;
> +            break;

Bike-shedding: instead of a switch statement and open-coded assignments,
is it worth setting up an array of function pointers where you just grab
the correct functions by doing array[refcount_order]?  But I don't see
any strong reason to change style; what you have works.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4 Max Reitz
@ 2014-11-15 17:09   ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-15 17:09 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1249 bytes --]

On 11/14/2014 06:06 AM, Max Reitz wrote:
> No longer refuse to open images with a different refcount entry width
> than 16 bits; only reject images with a refcount width larger than 64
> bits (which is prohibited by the specification).
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index d70e927..528d696 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -677,10 +677,10 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      /* Check support for various header values */
> -    if (header.refcount_order != 4) {
> -        report_unsupported(bs, errp, "%d bit reference counts",
> -                           1 << header.refcount_order);
> -        ret = -ENOTSUP;
> +    if (header.refcount_order > 6) {
> +        error_setg(errp, "Reference count entry width too large; may not "
> +                   "exceed 64 bit");

s/bit/bits/

Maintainer can make that tweak, so:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
@ 2014-11-15 17:13   ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-15 17:13 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On 11/14/2014 06:06 AM, Max Reitz wrote:
> Add a refcount_order parameter to qcow2_create2(), use that value for
> the image header and for calculating the size required for
> preallocation.
> 
> For now, always pass 4.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 41 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 30 insertions(+), 11 deletions(-)
> 

> @@ -1811,6 +1811,13 @@ static int qcow2_create2(const char *filename, int64_t total_size,
>          int64_t meta_size = 0;
>          uint64_t nreftablee, nrefblocke, nl1e, nl2e;
>          int64_t aligned_total_size = align_offset(total_size, cluster_size);
> +        int refblock_bits, refblock_size;
> +        /* refcount entry size in bytes */
> +        double rces = (1 << refcount_order) / 8.;
> +

Maybe worth a comment that absolute precision is not necessary, and that
we are okay that the result gets us to within a fraction of a percent of
the right value.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option Max Reitz
@ 2014-11-15 17:17   ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-15 17:17 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1564 bytes --]

On 11/14/2014 06:06 AM, Max Reitz wrote:
> Some tests do not work well with certain refcount widths (i.e. you
> cannot create internal snapshots with refcount_width=1), so make those
> widths unsupported.
> 
> Furthermore, add another filter to _filter_img_create in common.filter
> which filters out the refcount_width value.
> 
> This is necessary for test 079, which does actually work with any
> refcount width, but invoking qemu-img directly leads to the
> refcount_width value being visible in the output; use _make_test_img
> instead which will filter it out.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

> +++ b/tests/qemu-iotests/029
> @@ -44,6 +44,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow2
>  _supported_proto generic
>  _supported_os Linux
> +_unsupported_imgopts 'refcount_width=1[^0-9]'

Missed a comment here.

> +++ b/tests/qemu-iotests/common.filter
> @@ -190,7 +190,8 @@ _filter_img_create()
>          -e "s# block_size=[0-9]\\+##g" \
>          -e "s# block_state_zero=\\(on\\|off\\)##g" \
>          -e "s# log_size=[0-9]\\+##g" \
> -        -e "s/archipelago:a/TEST_DIR\//g"
> +        -e "s/archipelago:a/TEST_DIR\//g" \
> +        -e "s# refcount_width=[0-9]\\+##g"

I'm not convinced that \+ is portable sed; but as you are not the first
use, it's not worth changing.

Once you add the remaining comment,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-15 14:50   ` Eric Blake
@ 2014-11-17  8:34     ` Max Reitz
  2014-11-17 10:38       ` Max Reitz
  2014-11-17 12:06     ` Max Reitz
  1 sibling, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-17  8:34 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-15 at 15:50, Eric Blake wrote:
> On 11/14/2014 06:06 AM, Max Reitz wrote:
>> Add a test for conversion between different refcount widths and errors
>> specific to certain widths (i.e. snapshots with refcount_width=1).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/112     | 252 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/112.out | 131 +++++++++++++++++++++++
>>   tests/qemu-iotests/group   |   1 +
>>   3 files changed, 384 insertions(+)
>>   create mode 100755 tests/qemu-iotests/112
>>   create mode 100644 tests/qemu-iotests/112.out
>>
>> +echo
>> +echo '=== Testing too many references for check ==='
>> +echo
>> +
>> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
>> +print_refcount_width
>> +
>> +# This cluster should be created at 0x50000
>> +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
>> +# Now make the second L2 entry (the L2 table should be at 0x40000) point to that
>> +# cluster, so we have two references
>> +poke_file "$TEST_IMG" $((0x40008)) "\x80\x00\x00\x00\x00\x05\x00\x00"
>> +
>> +# This should say "please use amend"
>> +_check_test_img -r all
>> +
>> +# So we do that
>> +$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
>> +print_refcount_width
>> +
>> +# And try again
>> +_check_test_img -r all
> I think this section also deserves a test that fuzzes an image with
> width=64 to intentionally set the most significant bit of one of the
> refcounts, and make sure that we gracefully diagnose it as invalid.
>
>> +
>> +echo
>> +echo '=== Multiple walks necessary during amend ==='
>> +echo
>> +
>> +IMGOPTS="$IMGOPTS,refcount_width=1,cluster_size=512" _make_test_img 64k
>> +
>> +# Cluster 0 is the image header, clusters 1 to 4 are used by the L1 table, a
>> +# single L2 table, the reftable and a single refblock. This creates 58 data
>> +# clusters (actually, the L2 table is created here, too), so in total there are
>> +# then 63 used clusters in the image. With a refcount width of 64, one refblock
>> +# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), so this will
>> +# make the first target refblock have exactly one free entry.
>> +$QEMU_IO -c "write 0 $((58 * 512))" "$TEST_IMG" | _filter_qemu_io
>> +
>> +# Now change the refcount width; since the first target refblock has exactly one
>> +# free entry, that entry will be used to store its own reference. No other
>> +# refblocks are needed, so then the new reftable will be allocated; since the
>> +# first target refblock is completely filled up, this will require a new
>> +# refblock which is why the refcount width changing function will need to run
>> +# through everything one more time until the allocations are stable.
>> +$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
>> +print_refcount_width
> Umm, that sounds backwards from what you document.  It's a good test of
> the _new_ reftable needing a second round of allocations.  So keep it
> with corrected comments.  But I think you _intended_ to write a test
> that starts with a refcount_width=64 image and resize to a
> refcount_width=1, where the _old_ reftable then suffers a reallocation
> as part of allocating refblocks for the new table.

That's what you intended, but that's harder to test, so I settled for 
this (and the comments are appropriate (note that "target refblock" 
refers to the refblocks after amendment); note that this does indeed 
fail with v1 of this series.

> It may even help if
> you add a tracepoint for every iteration through the walk function
> callback, to prove we are indeed executing it 3 times instead of the
> usual 2, for these test cases.

That's a good idea! I thought about adding some info, but totally forgot 
about trace points.

I'll see whether I can add a test for increasing the size of the 
original reftable during amendment, too.

Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation
  2014-11-15 16:50   ` Eric Blake
@ 2014-11-17  8:37     ` Max Reitz
  0 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-17  8:37 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-15 at 17:50, Eric Blake wrote:
> On 11/14/2014 06:05 AM, Max Reitz wrote:
>> Add a helper function for reallocating a refcount array, independently
> s/independently/independent/
>
>> of the refcount order. The newly allocated space is zeroed and the
>> function handles failed reallocations gracefully.
> This patch is doing two things: it is refactoring things into a nice
> helper function (mentioned), AND it is adding a guarantee that you now
> always allocate a table on cluster boundaries, even when you aren't
> using the full table (hinted at elsewhere in the series, but noticeably
> absent here).  I think you want to add more comments to the commit
> message making that more obvious, since it looks like you rely on that
> guarantee later.

Will do.

>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 121 +++++++++++++++++++++++++++++--------------------
>>   1 file changed, 72 insertions(+), 49 deletions(-)
>>
>> +
>> +static int realloc_refcount_array(BDRVQcowState *s, uint16_t **array,
>> +                                  int64_t *size, int64_t new_size)
> I think this function deserves a comment stating that *array is actually
> allocated to full cluster size with a 0 tail, so that it can be written
> straight to disk.

OK, will add a comment.

>> +{
>> +    /* Round to clusters so the array can be directly written to disk */
>> +    size_t old_byte_size = ROUND_UP(refcount_array_byte_size(s, *size),
>> +                                    s->cluster_size);
>> +    size_t new_byte_size = ROUND_UP(refcount_array_byte_size(s, new_size),
>> +                                    s->cluster_size);
>> +    uint16_t *new_ptr;
> Can old_byte_size ever equal new_byte_size?  Or are we guaranteed that
> this will only be called when we really need to add another cluster to
> the reftable?
>
> [reading further]
>
> Yes, it looks like *size and new_size are not necessarily
> cluster-aligned, so as an example, it is very likely that we might call
> realloc_refcount_array with the existing size of 20 and a new size of
> 21, both of which fit within the same byte size when rounded up to
> cluster boundary.  But that means that the realloc is a no-op in that
> case; might it be worth special-casing rather than wasting time on the
> g_try_realloc and no-op memset?  [at least the code works correctly even
> without a special case shortcut]

Well, it's probably not necessary, but it will look most likely look 
better to catch that case.

>> +
>> +    new_ptr = g_try_realloc(*array, new_byte_size);
>> +    if (new_byte_size && !new_ptr) {
>> +        return -ENOMEM;
>> +    }
> Is it worth asserting that new_byte_size is non-zero?  Why would anyone
> ever call this to resize down to 0?  (But I can see where you DO call it
> with old_byte_size of zero, when initializing data structures and using
> this function for the first allocation.)

Hm, considering every image that can be opened using the qcow2 driver 
needs at least one cluster (the header), we can outrule that this is 
called with new_size == 0 (which would be the only way new_byte_size 
could ever be 0 either).

>> +
>> +    if (new_ptr) {
> If we assert that new_byte_size is non-zero, then at this point, new_ptr
> is non-NULL and this condition is pointless.
>
>> +        memset((void *)((uintptr_t)new_ptr + old_byte_size), 0,
>> +               new_byte_size - old_byte_size);
>> +    }
>> +
>> +    *array = new_ptr;
>> +    *size  = new_size;
>> +
>> +    return 0;
>> +}
>>   
> Code looks correct as written, whether or not you also add more
> comments, asserts, and/or shortcuts for no-op situations.  So:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!

Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification
  2014-11-15 17:02   ` Eric Blake
@ 2014-11-17  8:42     ` Max Reitz
  0 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-17  8:42 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-15 at 18:02, Eric Blake wrote:
> On 11/14/2014 06:06 AM, Max Reitz wrote:
>> Since refcounts do not always have to be a uint16_t, all refcount blocks
>> and arrays in memory should not have a specific type (thus they become
>> pointers to void) and for accessing them, two helper functions are used
>> (a getter and a setter). Those functions are called indirectly through
>> function pointers in the BDRVQcowState so they may later be exchanged
>> for different refcount orders.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 128 ++++++++++++++++++++++++++++++-------------------
>>   block/qcow2.h          |   8 ++++
>>   2 files changed, 87 insertions(+), 49 deletions(-)
>>
>> @@ -1216,7 +1249,7 @@ enum {
>>    * error occurred.
>>    */
>>   static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>> -    uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>> +    void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>>       int flags)
>>   {
> Might be worth fixing the indentation here in addition to all the other
> places you adjusted.  But that's minor.
>
>> @@ -1933,17 +1967,13 @@ write_refblocks:
>>               goto fail;
>>           }
>>   
>> -        on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size);
>> -        for (i = 0; i < s->refcount_block_size &&
>> -                    refblock_start + i < *nb_clusters; i++)
>> -        {
>> -            on_disk_refblock[i] =
>> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
>> -        }
>> +        /* The size of *refcount_table is always cluster-aligned, therefore the
>> +         * write operation will not overflow */
>> +        on_disk_refblock = (void *)((uintptr_t)*refcount_table +
>> +                                    (refblock_index << s->refcount_block_bits));
> Here is where you are relying on the guarantee that you added in 6/21,
> which is why I ask for that one to mention it.
>
> Nice reduction of a bounce buffer, by the way :)  Worth mentioning in
> the commit message as an intentional part of this commit?

Why not.

>> @@ -2087,7 +2117,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           /* Because the old reftable has been exchanged for a new one the
>>            * references have to be recalculated */
>>           rebuild = false;
>> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
> Phew; we're safe that this won't overflow; and good that you do the *
> first (if you did the /8 first, it would fail for sub-byte refcounts).

Thanks for catching this, it is wrong (albeit it does the right thing). 
It should use refcount_array_byte_size(), which was in this version of 
the series introduced before this patch, so it's an artifact of swapping 
patch 6 and 7.

Max

> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers for refcount modification
  2014-11-15 17:08   ` Eric Blake
@ 2014-11-17  8:44     ` Max Reitz
  0 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-17  8:44 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-15 at 18:08, Eric Blake wrote:
> On 11/14/2014 06:06 AM, Max Reitz wrote:
>> Add helper functions for getting and setting refcounts in a refcount
>> array for any possible refcount order, and choose the correct one during
>> refcount initialization.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 146 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 144 insertions(+), 2 deletions(-)
>>
>> +static void get_refcount_functions(int refcount_order,
>> +                                   Qcow2GetRefcountFunc **get,
>> +                                   Qcow2SetRefcountFunc **set)
>> +{
>> +    switch (refcount_order) {
>> +        case 0:
>> +            *get = &get_refcount_ro0;
>> +            *set = &set_refcount_ro0;
>> +            break;
> Bike-shedding: instead of a switch statement and open-coded assignments,
> is it worth setting up an array of function pointers where you just grab
> the correct functions by doing array[refcount_order]?  But I don't see
> any strong reason to change style; what you have works.

I thought about it, but it wouldn't get much shorter. But maybe it looks 
nicer. I ought to think about it again.

Max

> Reviewed-by: Eric Blake <eblake@redhat.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-17  8:34     ` Max Reitz
@ 2014-11-17 10:38       ` Max Reitz
  2014-11-17 11:02         ` Max Reitz
  0 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-17 10:38 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-17 at 09:34, Max Reitz wrote:
> On 2014-11-15 at 15:50, Eric Blake wrote:
>> On 11/14/2014 06:06 AM, Max Reitz wrote:
>>> Add a test for conversion between different refcount widths and errors
>>> specific to certain widths (i.e. snapshots with refcount_width=1).
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   tests/qemu-iotests/112     | 252 
>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>   tests/qemu-iotests/112.out | 131 +++++++++++++++++++++++
>>>   tests/qemu-iotests/group   |   1 +
>>>   3 files changed, 384 insertions(+)
>>>   create mode 100755 tests/qemu-iotests/112
>>>   create mode 100644 tests/qemu-iotests/112.out
>>>
>>> +echo
>>> +echo '=== Testing too many references for check ==='
>>> +echo
>>> +
>>> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
>>> +print_refcount_width
>>> +
>>> +# This cluster should be created at 0x50000
>>> +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
>>> +# Now make the second L2 entry (the L2 table should be at 0x40000) 
>>> point to that
>>> +# cluster, so we have two references
>>> +poke_file "$TEST_IMG" $((0x40008)) "\x80\x00\x00\x00\x00\x05\x00\x00"
>>> +
>>> +# This should say "please use amend"
>>> +_check_test_img -r all
>>> +
>>> +# So we do that
>>> +$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
>>> +print_refcount_width
>>> +
>>> +# And try again
>>> +_check_test_img -r all
>> I think this section also deserves a test that fuzzes an image with
>> width=64 to intentionally set the most significant bit of one of the
>> refcounts, and make sure that we gracefully diagnose it as invalid.
>>
>>> +
>>> +echo
>>> +echo '=== Multiple walks necessary during amend ==='
>>> +echo
>>> +
>>> +IMGOPTS="$IMGOPTS,refcount_width=1,cluster_size=512" _make_test_img 
>>> 64k
>>> +
>>> +# Cluster 0 is the image header, clusters 1 to 4 are used by the L1 
>>> table, a
>>> +# single L2 table, the reftable and a single refblock. This creates 
>>> 58 data
>>> +# clusters (actually, the L2 table is created here, too), so in 
>>> total there are
>>> +# then 63 used clusters in the image. With a refcount width of 64, 
>>> one refblock
>>> +# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), 
>>> so this will
>>> +# make the first target refblock have exactly one free entry.
>>> +$QEMU_IO -c "write 0 $((58 * 512))" "$TEST_IMG" | _filter_qemu_io
>>> +
>>> +# Now change the refcount width; since the first target refblock 
>>> has exactly one
>>> +# free entry, that entry will be used to store its own reference. 
>>> No other
>>> +# refblocks are needed, so then the new reftable will be allocated; 
>>> since the
>>> +# first target refblock is completely filled up, this will require 
>>> a new
>>> +# refblock which is why the refcount width changing function will 
>>> need to run
>>> +# through everything one more time until the allocations are stable.
>>> +$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
>>> +print_refcount_width
>> Umm, that sounds backwards from what you document.  It's a good test of
>> the _new_ reftable needing a second round of allocations.  So keep it
>> with corrected comments.  But I think you _intended_ to write a test
>> that starts with a refcount_width=64 image and resize to a
>> refcount_width=1, where the _old_ reftable then suffers a reallocation
>> as part of allocating refblocks for the new table.
>
> That's what you intended, but that's harder to test, so I settled for 
> this (and the comments are appropriate (note that "target refblock" 
> refers to the refblocks after amendment); note that this does indeed 
> fail with v1 of this series.
>
>> It may even help if
>> you add a tracepoint for every iteration through the walk function
>> callback, to prove we are indeed executing it 3 times instead of the
>> usual 2, for these test cases.
>
> That's a good idea! I thought about adding some info, but totally 
> forgot about trace points.

...On second thought, trace doesn't work so well with qemu-img. My best 
bet would be blkdebug, but that seems kind of ugly to me...

Max

> I'll see whether I can add a test for increasing the size of the 
> original reftable during amendment, too.
>
> Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-17 10:38       ` Max Reitz
@ 2014-11-17 11:02         ` Max Reitz
  0 siblings, 0 replies; 46+ messages in thread
From: Max Reitz @ 2014-11-17 11:02 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-17 at 11:38, Max Reitz wrote:
> On 2014-11-17 at 09:34, Max Reitz wrote:
>> On 2014-11-15 at 15:50, Eric Blake wrote:
>>> On 11/14/2014 06:06 AM, Max Reitz wrote:
>>>> Add a test for conversion between different refcount widths and errors
>>>> specific to certain widths (i.e. snapshots with refcount_width=1).
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>   tests/qemu-iotests/112     | 252 
>>>> +++++++++++++++++++++++++++++++++++++++++++++
>>>>   tests/qemu-iotests/112.out | 131 +++++++++++++++++++++++
>>>>   tests/qemu-iotests/group   |   1 +
>>>>   3 files changed, 384 insertions(+)
>>>>   create mode 100755 tests/qemu-iotests/112
>>>>   create mode 100644 tests/qemu-iotests/112.out
>>>>
>>>> +echo
>>>> +echo '=== Testing too many references for check ==='
>>>> +echo
>>>> +
>>>> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
>>>> +print_refcount_width
>>>> +
>>>> +# This cluster should be created at 0x50000
>>>> +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
>>>> +# Now make the second L2 entry (the L2 table should be at 0x40000) 
>>>> point to that
>>>> +# cluster, so we have two references
>>>> +poke_file "$TEST_IMG" $((0x40008)) "\x80\x00\x00\x00\x00\x05\x00\x00"
>>>> +
>>>> +# This should say "please use amend"
>>>> +_check_test_img -r all
>>>> +
>>>> +# So we do that
>>>> +$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
>>>> +print_refcount_width
>>>> +
>>>> +# And try again
>>>> +_check_test_img -r all
>>> I think this section also deserves a test that fuzzes an image with
>>> width=64 to intentionally set the most significant bit of one of the
>>> refcounts, and make sure that we gracefully diagnose it as invalid.
>>>
>>>> +
>>>> +echo
>>>> +echo '=== Multiple walks necessary during amend ==='
>>>> +echo
>>>> +
>>>> +IMGOPTS="$IMGOPTS,refcount_width=1,cluster_size=512" 
>>>> _make_test_img 64k
>>>> +
>>>> +# Cluster 0 is the image header, clusters 1 to 4 are used by the 
>>>> L1 table, a
>>>> +# single L2 table, the reftable and a single refblock. This 
>>>> creates 58 data
>>>> +# clusters (actually, the L2 table is created here, too), so in 
>>>> total there are
>>>> +# then 63 used clusters in the image. With a refcount width of 64, 
>>>> one refblock
>>>> +# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), 
>>>> so this will
>>>> +# make the first target refblock have exactly one free entry.
>>>> +$QEMU_IO -c "write 0 $((58 * 512))" "$TEST_IMG" | _filter_qemu_io
>>>> +
>>>> +# Now change the refcount width; since the first target refblock 
>>>> has exactly one
>>>> +# free entry, that entry will be used to store its own reference. 
>>>> No other
>>>> +# refblocks are needed, so then the new reftable will be 
>>>> allocated; since the
>>>> +# first target refblock is completely filled up, this will require 
>>>> a new
>>>> +# refblock which is why the refcount width changing function will 
>>>> need to run
>>>> +# through everything one more time until the allocations are stable.
>>>> +$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
>>>> +print_refcount_width
>>> Umm, that sounds backwards from what you document.  It's a good test of
>>> the _new_ reftable needing a second round of allocations.  So keep it
>>> with corrected comments.  But I think you _intended_ to write a test
>>> that starts with a refcount_width=64 image and resize to a
>>> refcount_width=1, where the _old_ reftable then suffers a reallocation
>>> as part of allocating refblocks for the new table.
>>
>> That's what you intended, but that's harder to test, so I settled for 
>> this (and the comments are appropriate (note that "target refblock" 
>> refers to the refblocks after amendment); note that this does indeed 
>> fail with v1 of this series.
>>
>>> It may even help if
>>> you add a tracepoint for every iteration through the walk function
>>> callback, to prove we are indeed executing it 3 times instead of the
>>> usual 2, for these test cases.
>>
>> That's a good idea! I thought about adding some info, but totally 
>> forgot about trace points.
>
> ...On second thought, trace doesn't work so well with qemu-img. My 
> best bet would be blkdebug, but that seems kind of ugly to me...

Problem "solved": If there will be more walks than originally thought 
(3+1 instead of 2+1), progress will regress at one point. I'll just grep 
for that point and that should be enough (progress jumping from 66.67 % 
to 50.00 %).

Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-15 14:50   ` Eric Blake
  2014-11-17  8:34     ` Max Reitz
@ 2014-11-17 12:06     ` Max Reitz
  2014-11-18 20:26       ` Eric Blake
  1 sibling, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-17 12:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-15 at 15:50, Eric Blake wrote:
> On 11/14/2014 06:06 AM, Max Reitz wrote:
>> Add a test for conversion between different refcount widths and errors
>> specific to certain widths (i.e. snapshots with refcount_width=1).
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   tests/qemu-iotests/112     | 252 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/112.out | 131 +++++++++++++++++++++++
>>   tests/qemu-iotests/group   |   1 +
>>   3 files changed, 384 insertions(+)
>>   create mode 100755 tests/qemu-iotests/112
>>   create mode 100644 tests/qemu-iotests/112.out
>>
>> +echo
>> +echo '=== Testing too many references for check ==='
>> +echo
>> +
>> +IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
>> +print_refcount_width
>> +
>> +# This cluster should be created at 0x50000
>> +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
>> +# Now make the second L2 entry (the L2 table should be at 0x40000) point to that
>> +# cluster, so we have two references
>> +poke_file "$TEST_IMG" $((0x40008)) "\x80\x00\x00\x00\x00\x05\x00\x00"
>> +
>> +# This should say "please use amend"
>> +_check_test_img -r all
>> +
>> +# So we do that
>> +$QEMU_IMG amend -o refcount_width=2 "$TEST_IMG"
>> +print_refcount_width
>> +
>> +# And try again
>> +_check_test_img -r all
> I think this section also deserves a test that fuzzes an image with
> width=64 to intentionally set the most significant bit of one of the
> refcounts, and make sure that we gracefully diagnose it as invalid.
>
>> +
>> +echo
>> +echo '=== Multiple walks necessary during amend ==='
>> +echo
>> +
>> +IMGOPTS="$IMGOPTS,refcount_width=1,cluster_size=512" _make_test_img 64k
>> +
>> +# Cluster 0 is the image header, clusters 1 to 4 are used by the L1 table, a
>> +# single L2 table, the reftable and a single refblock. This creates 58 data
>> +# clusters (actually, the L2 table is created here, too), so in total there are
>> +# then 63 used clusters in the image. With a refcount width of 64, one refblock
>> +# describes 64 clusters (512 bytes / 64 bits/entry = 64 entries), so this will
>> +# make the first target refblock have exactly one free entry.
>> +$QEMU_IO -c "write 0 $((58 * 512))" "$TEST_IMG" | _filter_qemu_io
>> +
>> +# Now change the refcount width; since the first target refblock has exactly one
>> +# free entry, that entry will be used to store its own reference. No other
>> +# refblocks are needed, so then the new reftable will be allocated; since the
>> +# first target refblock is completely filled up, this will require a new
>> +# refblock which is why the refcount width changing function will need to run
>> +# through everything one more time until the allocations are stable.
>> +$QEMU_IMG amend -o refcount_width=64 "$TEST_IMG"
>> +print_refcount_width
> Umm, that sounds backwards from what you document.  It's a good test of
> the _new_ reftable needing a second round of allocations.  So keep it
> with corrected comments.  But I think you _intended_ to write a test
> that starts with a refcount_width=64 image and resize to a
> refcount_width=1, where the _old_ reftable then suffers a reallocation
> as part of allocating refblocks for the new table.  It may even help if
> you add a tracepoint for every iteration through the walk function
> callback, to prove we are indeed executing it 3 times instead of the
> usual 2, for these test cases.

I'm currently thinking about a way to test the old reftable reallocation 
issue, and I can't find any. So, for the old reftable to require a 
reallocation it must grow. For it to grow we need some allocation beyond 
what it can currently represent. For this to happen during the refblock 
allocation walk, this allocation must be the allocation of a new refblock.

If the refblock is allocated beyond the current reftable's limit, this 
means that either all clusters between free_cluster_index and that point 
are already taken. If the reftable is then reallocated, it will 
therefore *always* be allocated behind that refblock, which is beyond 
its old limit. Therefore, that walk through the old reftable will never 
miss that new allocation.

So the issue can only occur if the old reftable is resized after the 
walk through it, that is, when allocating the new reftable. That is 
indeed an issue but I think it manifests itself basically like the issue 
I'm testing here: There is now an area in the old refcount structures 
which was free before but has is used now, and the allocation causing 
that was the allocation of the new reftable. The only difference is 
whether the it's the old or the new reftable that resides in the 
previously free area. Thus, I think I'll leave it at this test – but if 
you can describe to me how to create an image for a different "rewalk" 
path, I'm all ears.

Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
  2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment Max Reitz
@ 2014-11-18 17:55   ` Eric Blake
  2014-11-18 18:58     ` Max Reitz
  0 siblings, 1 reply; 46+ messages in thread
From: Eric Blake @ 2014-11-18 17:55 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 5702 bytes --]

On 11/14/2014 06:06 AM, Max Reitz wrote:
> Add a function qcow2_change_refcount_order() which allows changing the
> refcount order of a qcow2 image.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 457 +++++++++++++++++++++++++++++++++++++++++++++++++
>  block/qcow2.h          |   4 +
>  2 files changed, 461 insertions(+)
> 

> +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,

> +
> +        status_cb(bs, (uint64_t)index * s->refcount_table_size + reftable_index,
> +                  (uint64_t)total * s->refcount_table_size, cb_opaque);

Not sure if the casts are needed (isn't s->refcount_table_size already
uint64_t, and 'int * uint64_t' does the right thing); but I guess it
doesn't hurt to leave them.

> +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
> +                                BlockDriverAmendStatusCB *status_cb,
> +                                void *cb_opaque, Error **errp)
> +{

> +    do {
> +        int total_walks;
> +
> +        new_allocation = false;
> +
> +        /* At least we have to do this walk and the one which writes the
> +         * refblocks; also, at least we have to do this loop here at least
> +         * twice (normally), first to do the allocations, and second to
> +         * determine that everything is correctly allocated, this then makes
> +         * three walks in total */
> +        total_walks = MIN(walk_index + 2, 3);

This feels wrong...

> +
> +        /* First, allocate the structures so they are present in the refcount
> +         * structures */
> +        ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
> +                                 &new_reftable_size, NULL, new_refblock_size,
> +                                 new_refcount_bits, &alloc_refblock,
> +                                 &new_allocation, NULL, status_cb, cb_opaque,
> +                                 walk_index++, total_walks, errp);

...In the common case of just two iterations of the do loop (second
iteration confirms no allocations needed), you call with index 0/2, 1/3,
and then the later non-allocation walk is index 2/3.

In the rare case of three iterations of the do loop, you call with index
0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4.

I highly doubt that it is possible to trigger four iterations of the do
loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5.

I think you instead want to have:

total_walks = MAX(walk_index + 2, 3)

then the common case will call with 0/3, 1/3, and the later walk as 2/3

the three-iteration loop will call with 0/3, 1/3, 2/4, and the later
walk as 3/4

the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and
the later walk as 4/5.

> +
> +        new_reftable_index = 0;
> +
> +        if (new_allocation) {
> +            if (new_reftable_offset) {
> +                qcow2_free_clusters(bs, new_reftable_offset,
> +                                    allocated_reftable_size * sizeof(uint64_t),
> +                                    QCOW2_DISCARD_NEVER);

Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?
 Why not punch holes in the file when throwing out a failed too-small
new table, or when cleaning up the old table once the new table is good?

> +            }
> +
> +            new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
> +                                                           sizeof(uint64_t));
> +            if (new_reftable_offset < 0) {
> +                error_setg_errno(errp, -new_reftable_offset,
> +                                 "Failed to allocate the new reftable");
> +                ret = new_reftable_offset;
> +                goto done;
> +            }
> +            allocated_reftable_size = new_reftable_size;
> +
> +            new_allocation = true;

This assignment is dead code (it already occurs inside an 'if
(new_allocation)' condition).

> +        }
> +    } while (new_allocation);
> +
> +    /* Second, write the new refblocks */
> +    new_allocation = false;

This assignment is dead code (it can only be reached if the earlier do
loop ended, which is only possible when no allocations are recorded).

> +    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
> +                             &new_reftable_size, new_refblock,
> +                             new_refblock_size, new_refcount_bits,
> +                             &flush_refblock, &new_allocation, new_set_refcount,
> +                             status_cb, cb_opaque, walk_index, walk_index + 1,
> +                             errp);
> +    if (ret < 0) {
> +        goto done;
> +    }
> +    assert(!new_allocation);
> +

Correct.

> +done:
> +    if (new_reftable) {
> +        /* On success, new_reftable actually points to the old reftable (and
> +         * new_reftable_size is the old reftable's size); but that is just
> +         * fine */
> +        for (i = 0; i < new_reftable_size; i++) {
> +            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
> +            if (offset) {
> +                qcow2_free_clusters(bs, offset, s->cluster_size,
> +                                    QCOW2_DISCARD_NEVER);

Again, why the QCOW2_DISCARD_NEVER policy?

Fix the MIN vs. MAX bug, and the two dead assignment statements, and you
can have:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
  2014-11-18 17:55   ` Eric Blake
@ 2014-11-18 18:58     ` Max Reitz
  2014-11-18 19:56       ` Eric Blake
  0 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-18 18:58 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 18.11.2014 18:55, Eric Blake wrote:
> On 11/14/2014 06:06 AM, Max Reitz wrote:
>> Add a function qcow2_change_refcount_order() which allows changing the
>> refcount order of a qcow2 image.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 457 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/qcow2.h          |   4 +
>>   2 files changed, 461 insertions(+)
>>
>> +static int walk_over_reftable(BlockDriverState *bs, uint64_t **new_reftable,
>> +
>> +        status_cb(bs, (uint64_t)index * s->refcount_table_size + reftable_index,
>> +                  (uint64_t)total * s->refcount_table_size, cb_opaque);
> Not sure if the casts are needed (isn't s->refcount_table_size already
> uint64_t,

Surprise, it isn't. I thought otherwise, too, but then got told by 
clang_complete (it's uint32_t).

> and 'int * uint64_t' does the right thing); but I guess it
> doesn't hurt to leave them.
>
>> +int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
>> +                                BlockDriverAmendStatusCB *status_cb,
>> +                                void *cb_opaque, Error **errp)
>> +{
>> +    do {
>> +        int total_walks;
>> +
>> +        new_allocation = false;
>> +
>> +        /* At least we have to do this walk and the one which writes the
>> +         * refblocks; also, at least we have to do this loop here at least
>> +         * twice (normally), first to do the allocations, and second to
>> +         * determine that everything is correctly allocated, this then makes
>> +         * three walks in total */
>> +        total_walks = MIN(walk_index + 2, 3);
> This feels wrong...

Yes, I noticed already when preparing v3 and it's already fixed in my 
local v3 branch. *cough*

>> +
>> +        /* First, allocate the structures so they are present in the refcount
>> +         * structures */
>> +        ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
>> +                                 &new_reftable_size, NULL, new_refblock_size,
>> +                                 new_refcount_bits, &alloc_refblock,
>> +                                 &new_allocation, NULL, status_cb, cb_opaque,
>> +                                 walk_index++, total_walks, errp);
> ...In the common case of just two iterations of the do loop (second
> iteration confirms no allocations needed), you call with index 0/2, 1/3,
> and then the later non-allocation walk is index 2/3.
>
> In the rare case of three iterations of the do loop, you call with index
> 0/2, 1/3, 2/3, and then the later non-allocation walk is 3/4.
>
> I highly doubt that it is possible to trigger four iterations of the do
> loop, but if it were, you would call with 0/2, 1/3, 2/3, 3/3, and then 4/5.
>
> I think you instead want to have:
>
> total_walks = MAX(walk_index + 2, 3)
>
> then the common case will call with 0/3, 1/3, and the later walk as 2/3
>
> the three-iteration loop will call with 0/3, 1/3, 2/4, and the later
> walk as 3/4
>
> the unlikely four-iteration loop will call with 0/3, 1/3, 2/4, 3/5, and
> the later walk as 4/5.
>
>> +
>> +        new_reftable_index = 0;
>> +
>> +        if (new_allocation) {
>> +            if (new_reftable_offset) {
>> +                qcow2_free_clusters(bs, new_reftable_offset,
>> +                                    allocated_reftable_size * sizeof(uint64_t),
>> +                                    QCOW2_DISCARD_NEVER);
> Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?

Ah, discarding is always interesting... Last year I used 
QCOW2_DISCARD_ALWAYS, then asked Kevin and he basically said never to 
use ALWAYS unless one is really sure about it. I could have used 
QCOW2_DISCARD_OTHER... But the idea behind using NEVER in cases like 
this is that the clusters may get picked up by the following allocation, 
in which case having discarded them is not a good idea (there are some 
other places in the qcow2 code which use NEVER for the same reason).

So, in this case, I think NEVER is good.

> Why not punch holes in the file when throwing out a failed too-small
> new table, or when cleaning up the old table once the new table is good?
>
>> +            }
>> +
>> +            new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
>> +                                                           sizeof(uint64_t));
>> +            if (new_reftable_offset < 0) {
>> +                error_setg_errno(errp, -new_reftable_offset,
>> +                                 "Failed to allocate the new reftable");
>> +                ret = new_reftable_offset;
>> +                goto done;
>> +            }
>> +            allocated_reftable_size = new_reftable_size;
>> +
>> +            new_allocation = true;
> This assignment is dead code (it already occurs inside an 'if
> (new_allocation)' condition).

Right. Though I somehow like its explicitness... I'll remove it.

>> +        }
>> +    } while (new_allocation);
>> +
>> +    /* Second, write the new refblocks */
>> +    new_allocation = false;
> This assignment is dead code (it can only be reached if the earlier do
> loop ended, which is only possible when no allocations are recorded).

Right again.

>> +    ret = walk_over_reftable(bs, &new_reftable, &new_reftable_index,
>> +                             &new_reftable_size, new_refblock,
>> +                             new_refblock_size, new_refcount_bits,
>> +                             &flush_refblock, &new_allocation, new_set_refcount,
>> +                             status_cb, cb_opaque, walk_index, walk_index + 1,
>> +                             errp);
>> +    if (ret < 0) {
>> +        goto done;
>> +    }
>> +    assert(!new_allocation);
>> +
> Correct.
>
>> +done:
>> +    if (new_reftable) {
>> +        /* On success, new_reftable actually points to the old reftable (and
>> +         * new_reftable_size is the old reftable's size); but that is just
>> +         * fine */
>> +        for (i = 0; i < new_reftable_size; i++) {
>> +            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
>> +            if (offset) {
>> +                qcow2_free_clusters(bs, offset, s->cluster_size,
>> +                                    QCOW2_DISCARD_NEVER);
> Again, why the QCOW2_DISCARD_NEVER policy?

Here, I have nothing to justify it. I'll use QCOW2_DISCARD_OTHER in v3.

> Fix the MIN vs. MAX bug, and the two dead assignment statements, and you
> can have:
>
> Reviewed-by: Eric Blake <eblake@redhat.com>

I'll also use QCOW2_DISCARD_OTHER for freeing the refblocks and the 
reftable after the "done" label, if you're fine with that.

Once again, thanks a lot!

Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment
  2014-11-18 18:58     ` Max Reitz
@ 2014-11-18 19:56       ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-18 19:56 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2837 bytes --]

On 11/18/2014 11:58 AM, Max Reitz wrote:
> On 18.11.2014 18:55, Eric Blake wrote:
>> On 11/14/2014 06:06 AM, Max Reitz wrote:
>>> Add a function qcow2_change_refcount_order() which allows changing the
>>> refcount order of a qcow2 image.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---

>>> +        if (new_allocation) {
>>> +            if (new_reftable_offset) {
>>> +                qcow2_free_clusters(bs, new_reftable_offset,
>>> +                                    allocated_reftable_size *
>>> sizeof(uint64_t),
>>> +                                    QCOW2_DISCARD_NEVER);
>> Any reason you picked QCOW2_DISCARD_NEVER instead of some other policy?
> 
> Ah, discarding is always interesting... Last year I used
> QCOW2_DISCARD_ALWAYS, then asked Kevin and he basically said never to
> use ALWAYS unless one is really sure about it. I could have used
> QCOW2_DISCARD_OTHER... But the idea behind using NEVER in cases like
> this is that the clusters may get picked up by the following allocation,
> in which case having discarded them is not a good idea (there are some
> other places in the qcow2 code which use NEVER for the same reason).
> 
> So, in this case, I think NEVER is good.

Makes sense.  Yes, for THIS case, we are probably going to reuse the
just-discarded cluster on the very next walk, so it's not worth punching
a hole just to reinstate it.


>>> +done:
>>> +    if (new_reftable) {
>>> +        /* On success, new_reftable actually points to the old
>>> reftable (and
>>> +         * new_reftable_size is the old reftable's size); but that
>>> is just
>>> +         * fine */
>>> +        for (i = 0; i < new_reftable_size; i++) {
>>> +            uint64_t offset = new_reftable[i] & REFT_OFFSET_MASK;
>>> +            if (offset) {
>>> +                qcow2_free_clusters(bs, offset, s->cluster_size,
>>> +                                    QCOW2_DISCARD_NEVER);
>> Again, why the QCOW2_DISCARD_NEVER policy?
> 
> Here, I have nothing to justify it. I'll use QCOW2_DISCARD_OTHER in v3.

Thanks, and now I know a bit more about discard policy.

> 
>> Fix the MIN vs. MAX bug, and the two dead assignment statements, and you
>> can have:
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
> I'll also use QCOW2_DISCARD_OTHER for freeing the refblocks and the
> reftable after the "done" label, if you're fine with that.

Yes, works for me.

> 
> Once again, thanks a lot!

And thank you for a mentally engaging review :)  I'm still in the middle
of an email on a possible test you can write to provoke a different
3-pass scenario thanks to all-zero refblocks, so you may want to wait
for that before posting v3...

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-17 12:06     ` Max Reitz
@ 2014-11-18 20:26       ` Eric Blake
  2014-11-19  5:52         ` Eric Blake
  2014-11-20 13:48         ` Max Reitz
  0 siblings, 2 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-18 20:26 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 8932 bytes --]

On 11/17/2014 05:06 AM, Max Reitz wrote:

>> Umm, that sounds backwards from what you document.  It's a good test of
>> the _new_ reftable needing a second round of allocations.  So keep it
>> with corrected comments.  But I think you _intended_ to write a test
>> that starts with a refcount_width=64 image and resize to a
>> refcount_width=1, where the _old_ reftable then suffers a reallocation
>> as part of allocating refblocks for the new table.  It may even help if
>> you add a tracepoint for every iteration through the walk function
>> callback, to prove we are indeed executing it 3 times instead of the
>> usual 2, for these test cases.
> 
> I'm currently thinking about a way to test the old reftable reallocation
> issue, and I can't find any. So, for the old reftable to require a
> reallocation it must grow. For it to grow we need some allocation beyond
> what it can currently represent. For this to happen during the refblock
> allocation walk, this allocation must be the allocation of a new refblock.
> 
> If the refblock is allocated beyond the current reftable's limit, this
> means that either all clusters between free_cluster_index and that point
> are already taken. If the reftable is then reallocated, it will
> therefore *always* be allocated behind that refblock, which is beyond
> its old limit. Therefore, that walk through the old reftable will never
> miss that new allocation.
> 
> So the issue can only occur if the old reftable is resized after the
> walk through it, that is, when allocating the new reftable. That is
> indeed an issue but I think it manifests itself basically like the issue
> I'm testing here: There is now an area in the old refcount structures
> which was free before but has is used now, and the allocation causing
> that was the allocation of the new reftable. The only difference is
> whether the it's the old or the new reftable that resides in the
> previously free area. Thus, I think I'll leave it at this test – but if
> you can describe to me how to create an image for a different "rewalk"
> path, I'm all ears.

=====
The test you wrote does:

original image, pre-walk:
reftable is one cluster; with one refblock and 63 zero entries
 that refblock holds 4096 width-1 refcounts; of those, the first 63 are
non-zero, the remaining are zero. Image is 32256 bytes long

During the first walk, we call operation() 64 times - the first time
with refblock_empty false, the remaining 63 times with refblock_empty true.

after first walk but before reftable allocation, we have allocated one
refblock that holds 64 width-64 refcounts (all zero, because we don't
populate them until the final walk); and the old table now has 64
refcounts populated. Image is 32768 bytes long.

Then we allocating a new reftable; so far, we only created one refblock
for it to hold, so one cluster is sufficient. The allocation causes the
old table to now have 65 refcounts populated. Image is now 33280 bytes long.

On the second pass, we call operation() 64 times; now the first two
walks have refblock_empty as false, which means we allocate a new
refblock.  This allocation causes the old table to now have 66 refcounts
populated. Image is now 33792 bytes long.

So we free our first attempt at a new reftable, and allocate another (a
single cluster is still sufficient to hold two refblocks); I'm not sure
whether this free/realloc will reuse cluster 65 or if it will pick up
cluster 67 and leave a hole in 65.  [I guess it depends on whether
cluster allocation is done by first-fit analysis or whether it blindly
favors allocating at the end of the image].  Either way, we have to do a
third iteration, because the second iteration allocated a refblock and
"reallocated" a reftable.

On the third pass, operation() is still called 64 times, but because the
only two calls with refblock_empty as false already have an allocated
refblock, no further allocations are needed, and we are done with the do
loop; the fourth walk can set refcounts.

=====
The test I thought you were writing would start

original image, pre-walk:
reftable is one cluster; with one refblock and 63 zero entries
 that refblock holds 64 width-64 refcounts; of those, the first 63 are
non-zero, the remaining are zero. Image is 32256 bytes long

During the first walk, we call operation() 1 time, with refblock_empty
false.

after first walk but before reftable allocation, we have allocated one
refblock that holds 4096 width-1 refcounts (all zero, because we don't
populate them until the final walk); and the old table now has 64
refcounts populated. Image is 32768 bytes long.

Then we allocating a new reftable; so far, we only created one refblock
for it to hold, so one cluster is sufficient. The allocation causes the
old table to now have 66 refcounts populated (one for the new refblock,
but also one for an additional refblock in the old table because the
first refblock was full). Image is now 33792 bytes long.

On the second pass, we call operation() 1 time with refblock_empty as
false, so we don't need any allocation.

Which means the test you wrote is correct, while my idea does NOT
trigger the third walk, at least not for the initial file size of 32256.
 You've been vindicated, you did it correctly :)


=====
Now, in response to your question about some other 3-pass inducing
pattern, let's think back to v1, where you questioned what would happen
if a hole in the reftable gets turned into data due to a later
allocation.  Let's see if I can come up with a scenario for that...

Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
as our two sizes.  If we downsize from 64 to 32 bits, then every two
refblock clusters in the old table results in one call to operation()
for the new table; conversely, if we upsize, then every refblock cluster
in the old table gives two calls to operation() in the new table.  The
trick at hand is to come up with some image where we punch a hole so
that on the first pass, we call operation() with refblock_empty true for
one iteration (necessarily a call later than the first, since the image
header guarantees the first refblock is not empty), but where we have
data after the hole, where it is the later data that triggers the
allocation that will finally start to fill the hole.

How about starting with an image that occupies between 1.5 and 2
refblocks worth of 32-width clusters (so an image anywhere between 193
and 256 clusters, or between 98816 and 131072 bytes).  You should be
able to figure out how many clusters this consumes for L1, L2, plus 1
for header, reftable, and 2 for refblocks, in order to figure out how
many remaining clusters are dedicated to data; ideally, the data
clusters are contiguous, and occupy a swath that covers at least
clusters 126 through 192.  Widening to 64-bit width will require 4
refblocks instead of 2, if all refblocks are needed.  But the whole idea
of punching a hole is that we don't need a refblock if it will be
all-zero entries.  So take this original image, and discard the data
clusters from physical index 126 through 192, (this is NOT the data
visible at guest offset 31744, but whatever actual offset of guest data
that maps to physical offset 31744).  The old reftable now looks like {
refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
empty, 192-whatever occupied, tail empty] }.  With no allocations
required, this would in turn would map to the following new refblocks: {
refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
not need to allocate refblock_n3 because of the hole in the old
refblock; we DO end up allocating three refblocks, but in the sequence
of things, refblock_n1 and refblock_n2 are allocated while we are
visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
not allocated until after we have already passed over the first half of
refblock_o2.

Thus, the second walk over the image will see that we need to allocate
refblock_n3 because it now contains entries (in particular, the entry
for refblock_n4, but also the 1-cluster entry for the proposed reftable
that is allocated between the walks).  So, while your test used the
allocation of the reftable as the spillover point, my scenario here uses
the allocation of later refblocks as the spillover point that got missed
during the first iteration.


which means the reftable now looks like { refblock1, NULL, refblock3,
NULL... }; and where refblock1 now has at least two free entries
(possibly three, if the just-freed refblock2 happened to live before
cluster 62).  is we can also free refblock2

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-18 20:26       ` Eric Blake
@ 2014-11-19  5:52         ` Eric Blake
  2014-11-20 14:03           ` Max Reitz
  2014-11-20 13:48         ` Max Reitz
  1 sibling, 1 reply; 46+ messages in thread
From: Eric Blake @ 2014-11-19  5:52 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 13239 bytes --]

On 11/18/2014 01:26 PM, Eric Blake wrote:

> Now, in response to your question about some other 3-pass inducing
> pattern, let's think back to v1, where you questioned what would happen
> if a hole in the reftable gets turned into data due to a later
> allocation.  Let's see if I can come up with a scenario for that...
> 
> Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
> as our two sizes.  If we downsize from 64 to 32 bits, then every two
> refblock clusters in the old table results in one call to operation()
> for the new table; conversely, if we upsize, then every refblock cluster
> in the old table gives two calls to operation() in the new table.  The
> trick at hand is to come up with some image where we punch a hole so
> that on the first pass, we call operation() with refblock_empty true for
> one iteration (necessarily a call later than the first, since the image
> header guarantees the first refblock is not empty), but where we have
> data after the hole, where it is the later data that triggers the
> allocation that will finally start to fill the hole.
> 
> How about starting with an image that occupies between 1.5 and 2
> refblocks worth of 32-width clusters (so an image anywhere between 193
> and 256 clusters, or between 98816 and 131072 bytes).  You should be
> able to figure out how many clusters this consumes for L1, L2, plus 1
> for header, reftable, and 2 for refblocks, in order to figure out how
> many remaining clusters are dedicated to data; ideally, the data
> clusters are contiguous, and occupy a swath that covers at least
> clusters 126 through 192.  Widening to 64-bit width will require 4
> refblocks instead of 2, if all refblocks are needed.  But the whole idea
> of punching a hole is that we don't need a refblock if it will be
> all-zero entries.  So take this original image, and discard the data
> clusters from physical index 126 through 192, (this is NOT the data
> visible at guest offset 31744, but whatever actual offset of guest data
> that maps to physical offset 31744).  The old reftable now looks like {
> refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
> empty, 192-whatever occupied, tail empty] }.  With no allocations
> required, this would in turn would map to the following new refblocks: {
> refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
> empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
> not need to allocate refblock_n3 because of the hole in the old
> refblock; we DO end up allocating three refblocks, but in the sequence
> of things, refblock_n1 and refblock_n2 are allocated while we are
> visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
> not allocated until after we have already passed over the first half of
> refblock_o2.
> 
> Thus, the second walk over the image will see that we need to allocate
> refblock_n3 because it now contains entries (in particular, the entry
> for refblock_n4, but also the 1-cluster entry for the proposed reftable
> that is allocated between the walks).  So, while your test used the
> allocation of the reftable as the spillover point, my scenario here uses
> the allocation of later refblocks as the spillover point that got missed
> during the first iteration.
> 

Oops,...

> 
> which means the reftable now looks like { refblock1, NULL, refblock3,
> NULL... }; and where refblock1 now has at least two free entries
> (possibly three, if the just-freed refblock2 happened to live before
> cluster 62).  is we can also free refblock2
> 

...forgot to delete these random thoughts that I typed up but no longer
needed after reworking the above text.

At any rate, I'm not certain we can come up with a four-pass scenario;
if it is even possible, it would be quite complex.  Back in v1, I
questioned what would happen with a completely full reftable, where the
mere allocation of the new reftable causes a spillover not only in the
size of the new reftable, but also to the old.  Let's try to find a
scenario where the reftable does NOT spill on the first pass, but DOES
spill on the second.  A 32-bit width image spills from 1 to 2 clusters
for the reftable at the boundary of 64->65 refblocks (8192->8193
clusters); at that same cluster boundary, the reftable for a 64-bit
width table spills from 2 to 3 clusters.  Furthermore, a 64-bit width
refblock misses an allocation on the first pass if there is a hole of 64
aligned clusters.

First try:

We want an image that has roughly 128 free clusters on the first pass,
including a hole of at least 64 aligned clusters, to where a second pass
is needed to cover the refblock that was treated as a hole on the first
pass.  Furthermore, we want the second pass to completely fill the
image, so that the reftable allocation of 2 clusters after the walk is
the trigger of the spill, then the third pass will allocate another
refblock because of the spill, and a fourth pass is required to ensure
no allocations before the final pass of assigning refcounts.

Start with a 32-bit width image with clusters 0-125 allocated, 126-191
empty, then 192-8127 allocated (then 8128-8191 are unallocated to pad
out to cluster boundary) - describing a file of size 4161536.  This
image has 130 clusters spare before it spills, with a hole of 66
clusters near the front, and a tail of 64 clusters; since there is no
block of 128 aligned unallocated clusters, all 64 refblocks are in use.
 Now widen this image to 64-bit width refcounts.

On the first walk, we allocate two refblocks for clusters 0-127 (using
free slots 126 and 127), then leave an unallocated refblock for clusters
128-191, then allocate 124 more refblocks for clusters 192-8127 (using
free slots 128-191, as well as 8128-8187); this walk also picks up a
refblock for clusters 8128-8191 (using free slot 8188) because that area
of the file is allocated before the walk reaches that point, even if it
was a large enough hole to not need a refblock at the beginning of the
walk.  So we end the walk with 3 free clusters, and proceed to allocate
a 2-cluster reftable for the 127 refblocks that we created (using free
slots 8189-8190).

On the second walk, we allocate a refblock for clusters 128-191 (slot
8191). At the end of the walk, we free the 2-cluster reftable, then
reallocate; but as we still only need 2 clusters, we reuse slots
8189-8190.  Bummer - a third pass doesn't need allocation.

Second try:

Tweak the input by one cluster: 0-125 allocated, 126-191 empty, 192-8128
allocated (8129-8191 unallocated); file size 4162048.  129 spare
clusters, with hole of 66 and tail of 63.  On the first walk, we
allocate 127 refblocks (ending with the use of free slots 8129-8189),
then allocate a 2-cluster reftable (free slots 8190-8191).  On the
second walk, we allocate a refblock for clusters 128-191 (slot 8192 -
which triggers a spill of the old reftable), and a refblock for clusters
8192-8255 (since we already spilled).  After the walk, we now allocate a
3-cluster reftable, but it fits nicely within 8192-8255.  Bummer - a
third pass still doesn't need an allocation.

Third try:

Tweak the input by an entire reftable cluster.  Let's start with a
32-bit image with a 2-cluster reftable, and try for the spill of a
64-bit image from 3->4 clusters (happens at the boundary from cluster
12287->12288).  We want around 192 free clusters, plus the space for the
reftable, and still want a hole of at least 64 aligned clusters to
trigger the second pass as the one that fills to the spilling point.

So the image has clusters 0-125 allocated, 126-191 empty, then 192-12157
allocated (then 12158-16383 are unallocated to pad out to cluster
boundary) - describing a file of size 6224896.  This image has 196
clusters spare before it spills the 64-bit reftable, with a hole of 66
clusters near the front, and a tail of 130 clusters.

On the first walk, we allocate two refblocks for clusters 0-127 (using
free slots 126 and 127), then leave an unallocated refblock for clusters
128-191, then allocate 187 more refblocks for clusters 192-12157 (using
free slots 128-191, as well as 12158-12280); this walk also triggers a
refblock allocation for the 32-bit table (slot 12281 for clusters
12160-12287) and 2 refblocks for the 64-bit table (slots 12282-12283 for
clusters 12158-12287) because that area of the file is allocated before
the walk reaches that point.  So we end the walk with 4 free clusters,
and proceed to allocate a 3-cluster reftable for the 191 refblocks that
we created (using free slots 12284-12286).

On the second walk, we allocate a refblock for clusters 128-191 (slot
12287). At the end of the walk, we free the 3-cluster reftable, then
reallocate; but as we still only need 3 clusters, we reuse slots
12284-12286.  Bummer - a third pass doesn't need allocation.

I'm sensing a pattern here - trying to go for reftable spills isn't
going to help - by the second pass, we already have a reftable
reservation (which we free and then reallocate), and the only time we'd
need a larger reftable after the second walk is if we encountered extra
refblocks during the walk, but the only way to get extra refblocks
during the walk is to spill the old table during the second walk, but in
that case the second walk picks up the spill.

Fourth try:

Even trying to play with larger files with larger holes won't easily
help.  Suppose I have a file that has 0-62 allocated, then 63-4159
unallocated, followed by 262144 clusters allocated (an image over 128M
in size).  On the first walk, we allocate a refblock (slot 63) for
clusters 0-63, then pass over 64 refblocks that don't need allocation,
then allocate 4096 refblocks (slots 64-4159).  Not even considering the
slots that are added in the old table, this means the second pass will
allocate an additional 64 refblocks.  But this allocation will NOT spill
the size of the reftable, because the reftable is already sized large
enough to cover the refblocks for the 128M tail of the file.

What if we have a larger hole, and also tweak the input file so that
there are no missing refblocks of the old table (the holes are only
possible in the new table).  That is, start with 0-62 allocated, 63-127
free, then a repeating pattern of 64 allocated clusters and 64 free,
over the course of 130 repetitions.  Follow that with 262144 allocated
clusters.  This means we have 65 + 130*64 == 8385 free clusters, to
dedicate solely to the new reftable; while the image occupies
(131*128)+262144 == 278912 clusters (requires 4358 refblock entries,
which in turn requires 69 contiguous clusters for the reftable).

On the first walk, we alternate between allocating a refblock and
leaving a hole for a repetition of 131 times, then allocate 4096
refblocks.  Thus, we have consumed 4227 allocations out of the 8385 free
clusters; where the old table is now using the space of 67 of the
allocation holes.  Furthermore, the reftable allocation is too large to
fit in any of the holes, so it gets appended at the end of the image
(the image is now 278981 clusters, requiring 4360 refblock entries, but
still only 69 clusters for the reftable).  On the second walk, we
allocate 67 more refblocks for the refcounts in the old table that we
first treated as holes, plus 2 more refblocks for the tail of the file
holding the new reftable; these allocations still fit in the holes, and
are sufficient to fill up another hole.  However, the question remains -
does the hole in the old table get filled before or after the new table
has already passed by the hole?  If it is after, a third iteration would
allocate yet another refblock or two; but if it is before, then the 2nd
walk will already allocate refblocks when it encounters that spot in the
old reftable.  Remember, on the first iteration, we allocated a refblock
for 0-63 (slot 63), for 128-191 (slot 64), and so on - the reason the
second pass allocates a refblock for 64-127 is because the first pass
didn't populate that area of the file until it was already beyond
cluster 127.  But on the second pass, the first refblock we allocate for
clusters 64-127 will live somewhere around slot 8576, so it will be seen
during the second walk.  Even the refblock for clusters 8512-8575 will
end up being allocated somewhere around slot 8640, which still gets
visited during the second walk.  You may still be able to tweak things
to the point that we trigger an allocating third walk, but I'm not
readily seeing what that tweak would be.

At this point, I've spent far too long writing this email.  I haven't
completely ruled out the possibility of a corner case needing four
passes through the do loop, but the image sizes required to get there
are starting to be quite large compared to your simpler test of needing
three passes through the do loop.  I won't be bothered if we call it
good, and quit trying to come up with any other "interesting" allocation
sequencing.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-18 20:26       ` Eric Blake
  2014-11-19  5:52         ` Eric Blake
@ 2014-11-20 13:48         ` Max Reitz
  2014-11-20 21:27           ` Eric Blake
  1 sibling, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-20 13:48 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-18 at 21:26, Eric Blake wrote:
> On 11/17/2014 05:06 AM, Max Reitz wrote:
>
>>> Umm, that sounds backwards from what you document.  It's a good test of
>>> the _new_ reftable needing a second round of allocations.  So keep it
>>> with corrected comments.  But I think you _intended_ to write a test
>>> that starts with a refcount_width=64 image and resize to a
>>> refcount_width=1, where the _old_ reftable then suffers a reallocation
>>> as part of allocating refblocks for the new table.  It may even help if
>>> you add a tracepoint for every iteration through the walk function
>>> callback, to prove we are indeed executing it 3 times instead of the
>>> usual 2, for these test cases.
>> I'm currently thinking about a way to test the old reftable reallocation
>> issue, and I can't find any. So, for the old reftable to require a
>> reallocation it must grow. For it to grow we need some allocation beyond
>> what it can currently represent. For this to happen during the refblock
>> allocation walk, this allocation must be the allocation of a new refblock.
>>
>> If the refblock is allocated beyond the current reftable's limit, this
>> means that either all clusters between free_cluster_index and that point
>> are already taken. If the reftable is then reallocated, it will
>> therefore *always* be allocated behind that refblock, which is beyond
>> its old limit. Therefore, that walk through the old reftable will never
>> miss that new allocation.
>>
>> So the issue can only occur if the old reftable is resized after the
>> walk through it, that is, when allocating the new reftable. That is
>> indeed an issue but I think it manifests itself basically like the issue
>> I'm testing here: There is now an area in the old refcount structures
>> which was free before but has is used now, and the allocation causing
>> that was the allocation of the new reftable. The only difference is
>> whether the it's the old or the new reftable that resides in the
>> previously free area. Thus, I think I'll leave it at this test – but if
>> you can describe to me how to create an image for a different "rewalk"
>> path, I'm all ears.
> =====
> The test you wrote does:
>
> original image, pre-walk:
> reftable is one cluster; with one refblock and 63 zero entries
>   that refblock holds 4096 width-1 refcounts; of those, the first 63 are
> non-zero, the remaining are zero. Image is 32256 bytes long
>
> During the first walk, we call operation() 64 times - the first time
> with refblock_empty false, the remaining 63 times with refblock_empty true.
>
> after first walk but before reftable allocation, we have allocated one
> refblock that holds 64 width-64 refcounts (all zero, because we don't
> populate them until the final walk); and the old table now has 64
> refcounts populated. Image is 32768 bytes long.
>
> Then we allocating a new reftable; so far, we only created one refblock
> for it to hold, so one cluster is sufficient. The allocation causes the
> old table to now have 65 refcounts populated. Image is now 33280 bytes long.
>
> On the second pass, we call operation() 64 times; now the first two
> walks have refblock_empty as false, which means we allocate a new
> refblock.  This allocation causes the old table to now have 66 refcounts
> populated. Image is now 33792 bytes long.
>
> So we free our first attempt at a new reftable, and allocate another (a
> single cluster is still sufficient to hold two refblocks); I'm not sure
> whether this free/realloc will reuse cluster 65 or if it will pick up
> cluster 67 and leave a hole in 65.  [I guess it depends on whether
> cluster allocation is done by first-fit analysis or whether it blindly
> favors allocating at the end of the image].

There is a free_cluster_index to speed up finding the first fit. It's 
reset when freeing clusters before that index, therefore cluster 65 
should be reused.

> Either way, we have to do a
> third iteration, because the second iteration allocated a refblock and
> "reallocated" a reftable.
>
> On the third pass, operation() is still called 64 times, but because the
> only two calls with refblock_empty as false already have an allocated
> refblock, no further allocations are needed, and we are done with the do
> loop; the fourth walk can set refcounts.
>
> =====
> The test I thought you were writing would start
>
> original image, pre-walk:
> reftable is one cluster; with one refblock and 63 zero entries
>   that refblock holds 64 width-64 refcounts; of those, the first 63 are
> non-zero, the remaining are zero. Image is 32256 bytes long
>
> During the first walk, we call operation() 1 time, with refblock_empty
> false.
>
> after first walk but before reftable allocation, we have allocated one
> refblock that holds 4096 width-1 refcounts (all zero, because we don't
> populate them until the final walk); and the old table now has 64
> refcounts populated. Image is 32768 bytes long.
>
> Then we allocating a new reftable; so far, we only created one refblock
> for it to hold, so one cluster is sufficient. The allocation causes the
> old table to now have 66 refcounts populated (one for the new refblock,
> but also one for an additional refblock in the old table because the
> first refblock was full). Image is now 33792 bytes long.
>
> On the second pass, we call operation() 1 time with refblock_empty as
> false, so we don't need any allocation.
>
> Which means the test you wrote is correct, while my idea does NOT
> trigger the third walk, at least not for the initial file size of 32256.
>   You've been vindicated, you did it correctly :)
>
>
> =====
> Now, in response to your question about some other 3-pass inducing
> pattern, let's think back to v1, where you questioned what would happen
> if a hole in the reftable gets turned into data due to a later
> allocation.  Let's see if I can come up with a scenario for that...
>
> Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
> as our two sizes.  If we downsize from 64 to 32 bits, then every two
> refblock clusters in the old table results in one call to operation()
> for the new table; conversely, if we upsize, then every refblock cluster
> in the old table gives two calls to operation() in the new table.  The
> trick at hand is to come up with some image where we punch a hole so
> that on the first pass, we call operation() with refblock_empty true for
> one iteration (necessarily a call later than the first, since the image
> header guarantees the first refblock is not empty), but where we have
> data after the hole, where it is the later data that triggers the
> allocation that will finally start to fill the hole.
>
> How about starting with an image that occupies between 1.5 and 2
> refblocks worth of 32-width clusters (so an image anywhere between 193
> and 256 clusters, or between 98816 and 131072 bytes).  You should be
> able to figure out how many clusters this consumes for L1, L2, plus 1
> for header, reftable, and 2 for refblocks, in order to figure out how
> many remaining clusters are dedicated to data; ideally, the data
> clusters are contiguous, and occupy a swath that covers at least
> clusters 126 through 192.  Widening to 64-bit width will require 4
> refblocks instead of 2, if all refblocks are needed.  But the whole idea
> of punching a hole is that we don't need a refblock if it will be
> all-zero entries.  So take this original image, and discard the data
> clusters from physical index 126 through 192, (this is NOT the data
> visible at guest offset 31744, but whatever actual offset of guest data
> that maps to physical offset 31744).  The old reftable now looks like {
> refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
> empty, 192-whatever occupied, tail empty] }.  With no allocations
> required, this would in turn would map to the following new refblocks: {
> refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
> empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
> not need to allocate refblock_n3 because of the hole in the old
> refblock; we DO end up allocating three refblocks, but in the sequence
> of things, refblock_n1 and refblock_n2 are allocated while we are
> visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
> not allocated until after we have already passed over the first half of
> refblock_o2.
>
> Thus, the second walk over the image will see that we need to allocate
> refblock_n3 because it now contains entries (in particular, the entry
> for refblock_n4, but also the 1-cluster entry for the proposed reftable
> that is allocated between the walks).  So, while your test used the
> allocation of the reftable as the spillover point, my scenario here uses
> the allocation of later refblocks as the spillover point that got missed
> during the first iteration.

Sounds good, the only problem is that I'd have to hand-craft the image 
myself, because qemu generally uses self-references for refblocks (when 
allocating new refblocks, they will contain their own refcount).

I think this already would be too much effort (I'll reply to your second 
email right away ;-)). There is no fundamental difference in how new 
allocations for the new reftable and for the new refblocks are treated: 
If there's a new allocation, respin. If that works for the new reftable, 
that's enough to convince me it will work for new refblocks as well.

Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-19  5:52         ` Eric Blake
@ 2014-11-20 14:03           ` Max Reitz
  2014-11-20 21:21             ` Eric Blake
  0 siblings, 1 reply; 46+ messages in thread
From: Max Reitz @ 2014-11-20 14:03 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

On 2014-11-19 at 06:52, Eric Blake wrote:
> On 11/18/2014 01:26 PM, Eric Blake wrote:
>
>> Now, in response to your question about some other 3-pass inducing
>> pattern, let's think back to v1, where you questioned what would happen
>> if a hole in the reftable gets turned into data due to a later
>> allocation.  Let's see if I can come up with a scenario for that...
>>
>> Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
>> as our two sizes.  If we downsize from 64 to 32 bits, then every two
>> refblock clusters in the old table results in one call to operation()
>> for the new table; conversely, if we upsize, then every refblock cluster
>> in the old table gives two calls to operation() in the new table.  The
>> trick at hand is to come up with some image where we punch a hole so
>> that on the first pass, we call operation() with refblock_empty true for
>> one iteration (necessarily a call later than the first, since the image
>> header guarantees the first refblock is not empty), but where we have
>> data after the hole, where it is the later data that triggers the
>> allocation that will finally start to fill the hole.
>>
>> How about starting with an image that occupies between 1.5 and 2
>> refblocks worth of 32-width clusters (so an image anywhere between 193
>> and 256 clusters, or between 98816 and 131072 bytes).  You should be
>> able to figure out how many clusters this consumes for L1, L2, plus 1
>> for header, reftable, and 2 for refblocks, in order to figure out how
>> many remaining clusters are dedicated to data; ideally, the data
>> clusters are contiguous, and occupy a swath that covers at least
>> clusters 126 through 192.  Widening to 64-bit width will require 4
>> refblocks instead of 2, if all refblocks are needed.  But the whole idea
>> of punching a hole is that we don't need a refblock if it will be
>> all-zero entries.  So take this original image, and discard the data
>> clusters from physical index 126 through 192, (this is NOT the data
>> visible at guest offset 31744, but whatever actual offset of guest data
>> that maps to physical offset 31744).  The old reftable now looks like {
>> refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
>> empty, 192-whatever occupied, tail empty] }.  With no allocations
>> required, this would in turn would map to the following new refblocks: {
>> refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
>> empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
>> not need to allocate refblock_n3 because of the hole in the old
>> refblock; we DO end up allocating three refblocks, but in the sequence
>> of things, refblock_n1 and refblock_n2 are allocated while we are
>> visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
>> not allocated until after we have already passed over the first half of
>> refblock_o2.
>>
>> Thus, the second walk over the image will see that we need to allocate
>> refblock_n3 because it now contains entries (in particular, the entry
>> for refblock_n4, but also the 1-cluster entry for the proposed reftable
>> that is allocated between the walks).  So, while your test used the
>> allocation of the reftable as the spillover point, my scenario here uses
>> the allocation of later refblocks as the spillover point that got missed
>> during the first iteration.
>>
> Oops,...
>
>> which means the reftable now looks like { refblock1, NULL, refblock3,
>> NULL... }; and where refblock1 now has at least two free entries
>> (possibly three, if the just-freed refblock2 happened to live before
>> cluster 62).  is we can also free refblock2
>>
> ...forgot to delete these random thoughts that I typed up but no longer
> needed after reworking the above text.
>
> At any rate, I'm not certain we can come up with a four-pass scenario;
> if it is even possible, it would be quite complex.

[snip] (But rest assured, I read it all ;-))

> At this point, I've spent far too long writing this email.  I haven't
> completely ruled out the possibility of a corner case needing four
> passes through the do loop, but the image sizes required to get there
> are starting to be quite large compared to your simpler test of needing
> three passes through the do loop.

Right, see test 026. Without an SSD, it takes more than ten minutes, not 
least because it tests resizing the reftable which means writing a lot 
of data to an image with 512 byte clusters.

> I won't be bothered if we call it
> good, and quit trying to come up with any other "interesting" allocation
> sequencing.

The problem is, in my opinion, that we won't gain a whole lot from 
proving that there are cases where you need a fourth pass and test these 
cases. Fundamentally, they are not different from cases with three 
passes (technically, not even different from two pass cases). You scan 
through the refcounts, you detect that you need refblocks which you have 
not yet allocated, you allocate them, then you respin until all 
allocations are done. The only problem would be whether it'd be possible 
to run into an infinite loop: Can allocating new refblocks lead to a 
case where we have to allocate even more refblocks? Well, just judging 
from how complicated it is to even find a case where the number of new 
allocations hasn't gone down to zero in the fourth pass, we can safely 
rule that out.

Some people may ask why the walks are performed in a loop without a 
fixed limit (because they can't find cases where allocations haven't 
settled at the third pass). But I doubt that'll be a serious problem. 
It's much easier to have such a basically unlimited loop with the 
reasoning "We don't know exactly how many loops it'll take, but it will 
definitely settle at some point in time" than limiting the loop and then 
having to explain why we know exactly that it won't take more than X 
passes. The only problem with not limiting is that we need one walk to 
verify that all allocations have settled. But we need that for the 
common case (two passes) anyway, so that's not an issue.

The code from this version does not care whether it takes one, two, 
three, four or 42 passes. It's all the same. It will never take one and 
it will probably never take 42 passes; but if it does, well, it will 
work. Therefore, I think testing one non-standard number of passes 
(three) is enough. I'd like to test more, but the effort's just not 
worth it. I think.

Max

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-20 14:03           ` Max Reitz
@ 2014-11-20 21:21             ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-20 21:21 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2846 bytes --]

On 11/20/2014 07:03 AM, Max Reitz wrote:
> Some people may ask why the walks are performed in a loop without a
> fixed limit (because they can't find cases where allocations haven't
> settled at the third pass). But I doubt that'll be a serious problem.
> It's much easier to have such a basically unlimited loop with the
> reasoning "We don't know exactly how many loops it'll take, but it will
> definitely settle at some point in time" than limiting the loop and then
> having to explain why we know exactly that it won't take more than X
> passes. The only problem with not limiting is that we need one walk to
> verify that all allocations have settled. But we need that for the
> common case (two passes) anyway, so that's not an issue.
> 
> The code from this version does not care whether it takes one, two,
> three, four or 42 passes. It's all the same. It will never take one and
> it will probably never take 42 passes; but if it does, well, it will
> work. Therefore, I think testing one non-standard number of passes
> (three) is enough. I'd like to test more, but the effort's just not
> worth it. I think.

Yep, I agree.  I've pretty much convinced myself that the REASON we are
guaranteed that things converge is that each successive iteration
allocates fewer clusters than the one before, and that in later
iterations, refblocks are not fully populated by these fewer allocations
(that is, on recursion, we are allocating geometrically less).

I think I may have found a case that needs four passes.  What if between
the first and second pass, we have enough refblocks to require
allocating 2752 or more contiguous clusters for the new reftable (again
continuing with my 64-bit from 32-bit example, this means at least 1376
contiguous clusters in the old reftable).  That's a huge image already
(176128 refblocks, 11,272,192 clusters, or 5,771,362,304 bytes).  If we
time things so that the first pass ends without spilling the old
reftable (which by now seems fairly tractable to compute how many spare
clusters to start with), then allocating the new reftable will also
spill the old reftable, and based on the reftables alone, will result in
more than 4096 newly-referenced clusters on the second pass (or more
than 64 new refblocks).  This in turn is enough to require another full
refblock just to describe the reftable, but that spills the size of the
new reftable, so between the second and third iteration we now have to
allocate 2753 instead of 2752 contiguous clusters.  And _that_
reallocation is enough for the third pass to have to allocate yet more
clusters.  But like you say, testing this is going to be prohibitively
slow (it's not worth a 5 gigabyte test).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
  2014-11-20 13:48         ` Max Reitz
@ 2014-11-20 21:27           ` Eric Blake
  0 siblings, 0 replies; 46+ messages in thread
From: Eric Blake @ 2014-11-20 21:27 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Peter Lieven, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 857 bytes --]

On 11/20/2014 06:48 AM, Max Reitz wrote:
> Sounds good, the only problem is that I'd have to hand-craft the image
> myself, because qemu generally uses self-references for refblocks (when
> allocating new refblocks, they will contain their own refcount).
> 
> I think this already would be too much effort (I'll reply to your second
> email right away ;-)). There is no fundamental difference in how new
> allocations for the new reftable and for the new refblocks are treated:
> If there's a new allocation, respin. If that works for the new reftable,
> that's enough to convince me it will work for new refblocks as well.

Agreed - one test of needing a third pass is sufficient to prove we
handle allocations until convergence.


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-11-20 21:28 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
2014-11-15 16:00   ` Eric Blake
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 03/21] qcow2: Use 64 bits for refcount values Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation Max Reitz
2014-11-15 16:50   ` Eric Blake
2014-11-17  8:37     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification Max Reitz
2014-11-15 17:02   ` Eric Blake
2014-11-17  8:42     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers " Max Reitz
2014-11-15 17:08   ` Eric Blake
2014-11-17  8:44     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4 Max Reitz
2014-11-15 17:09   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
2014-11-15 17:13   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option Max Reitz
2014-11-15 17:17   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 13/21] block: Add opaque value to the amend CB Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 17/21] qcow2: Use intermediate helper CB " Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment Max Reitz
2014-11-18 17:55   ` Eric Blake
2014-11-18 18:58     ` Max Reitz
2014-11-18 19:56       ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 19/21] qcow2: Invoke refcount order amendment function Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 20/21] qcow2: Point to amend function in check Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths Max Reitz
2014-11-15 14:50   ` Eric Blake
2014-11-17  8:34     ` Max Reitz
2014-11-17 10:38       ` Max Reitz
2014-11-17 11:02         ` Max Reitz
2014-11-17 12:06     ` Max Reitz
2014-11-18 20:26       ` Eric Blake
2014-11-19  5:52         ` Eric Blake
2014-11-20 14:03           ` Max Reitz
2014-11-20 21:21             ` Eric Blake
2014-11-20 13:48         ` Max Reitz
2014-11-20 21:27           ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.