All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 00/27] Add subcluster allocation to qcow2
@ 2019-12-22 11:36 Alberto Garcia
  2019-12-22 11:36 ` [RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta() Alberto Garcia
                   ` (27 more replies)
  0 siblings, 28 replies; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

This version fixes many of the problems highlighted by Max. I decided
not to replace completely the cluster logic with subcluster logic in
all cases because I felt that sometimes it only complicated the code.
Let's see what you think :-)

Berto

v3:
- Patch 01: Rename host_offset to host_cluster_offset and make 'bytes'
            an unsigned int [Max]
- Patch 03: Rename cluster_needs_cow to cluster_needs_new_alloc and
            count_cow_clusters to count_single_write_clusters. Update
            documentation and add more assertions and checks [Max]
- Patch 09: Update qcow2_co_truncate() to properly support extended L2
            entries [Max]
- Patch 10: Forbid calling set_l2_bitmap() if the image does not have
            extended L2 entries [Max]
- Patch 11 (new): Add QCow2SubclusterType [Max]
- Patch 12 (new): Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
- Patch 13 (new): Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
- Patch 14: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
- Patch 15: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
- Patch 19: Don't call set_l2_bitmap() if the image does not have
            extended L2 entries [Max]
- Patch 21: Use smaller data types.
- Patch 22: Don't call set_l2_bitmap() if the image does not have
            extended L2 entries [Max]
- Patch 23: Use smaller data types.
- Patch 25: Update test results and documentation. Move the check for
            the minimum subcluster size to validate_cluster_size().
- Patch 26 (new): Add subcluster support to qcow2_measure()
- Patch 27: Add more tests

v2: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01642.html
- Patch 12: Update after the changes in 88f468e546.
- Patch 21 (new): Clear the L2 bitmap when allocating a compressed
  cluster. Compressed clusters should have the bitmap all set to 0.
- Patch 24: Document the new fields in the QAPI documentation [Eric].
- Patch 25: Allow qcow2 preallocation with backing files.
- Patch 26: Add some tests for qcow2 images with extended L2 entries.

v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v2:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/27:[0013] [FC] 'qcow2: Add calculate_l2_meta()'
002/27:[----] [-C] 'qcow2: Split cluster_needs_cow() out of count_cow_clusters()'
003/27:[0083] [FC] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()'
004/27:[----] [-C] 'qcow2: Add get_l2_entry() and set_l2_entry()'
005/27:[----] [--] 'qcow2: Document the Extended L2 Entries feature'
006/27:[----] [--] 'qcow2: Add dummy has_subclusters() function'
007/27:[----] [--] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
008/27:[----] [--] 'qcow2: Add offset_to_sc_index()'
009/27:[0008] [FC] 'qcow2: Add l2_entry_size()'
010/27:[0008] [FC] 'qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()'
011/27:[down] 'qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()'
012/27:[down] 'qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*'
013/27:[down] 'qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC'
014/27:[0060] [FC] 'qcow2: Add subcluster support to calculate_l2_meta()'
015/27:[0091] [FC] 'qcow2: Add subcluster support to qcow2_get_cluster_offset()'
016/27:[----] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
017/27:[----] [--] 'qcow2: Add subcluster support to discard_in_l2_slice()'
018/27:[----] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
019/27:[0008] [FC] 'qcow2: Add subcluster support to expand_zero_clusters_in_l1()'
020/27:[----] [--] 'qcow2: Fix offset calculation in handle_dependencies()'
021/27:[0007] [FC] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
022/27:[0004] [FC] 'qcow2: Clear the L2 bitmap when allocating a compressed cluster'
023/27:[0002] [FC] 'qcow2: Add subcluster support to handle_alloc_space()'
024/27:[----] [-C] 'qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only'
025/27:[0049] [FC] 'qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit'
026/27:[down] 'qcow2: Add subcluster support to qcow2_measure()'
027/27:[0046] [FC] 'iotests: Add tests for qcow2 images with extended L2 entries'

Alberto Garcia (27):
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: Add offset_to_sc_index()
  qcow2: Add l2_entry_size()
  qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  qcow2: Add subcluster support to calculate_l2_meta()
  qcow2: Add subcluster support to qcow2_get_cluster_offset()
  qcow2: Add subcluster support to zero_in_l2_slice()
  qcow2: Add subcluster support to discard_in_l2_slice()
  qcow2: Add subcluster support to check_refcounts_l2()
  qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  qcow2: Fix offset calculation in handle_dependencies()
  qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  qcow2: Clear the L2 bitmap when allocating a compressed cluster
  qcow2: Add subcluster support to handle_alloc_space()
  qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  qcow2: Add subcluster support to qcow2_measure()
  iotests: Add tests for qcow2 images with extended L2 entries

 block/qcow2-cluster.c            | 645 ++++++++++++++++++++-----------
 block/qcow2-refcount.c           |  38 +-
 block/qcow2.c                    | 200 +++++++---
 block/qcow2.h                    | 150 ++++++-
 docs/interop/qcow2.txt           |  68 +++-
 docs/qcow2-cache.txt             |  19 +-
 include/block/block_int.h        |   1 +
 qapi/block-core.json             |   7 +
 tests/qemu-iotests/031.out       |   8 +-
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 ++---
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  20 +-
 tests/qemu-iotests/065           |  18 +-
 tests/qemu-iotests/082.out       |  48 ++-
 tests/qemu-iotests/085.out       |  38 +-
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +-
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   4 +
 tests/qemu-iotests/242.out       |   5 +
 tests/qemu-iotests/255.out       |   8 +-
 tests/qemu-iotests/271           | 256 ++++++++++++
 tests/qemu-iotests/271.out       | 208 ++++++++++
 tests/qemu-iotests/273.out       |   9 +-
 tests/qemu-iotests/common.filter |   1 +
 tests/qemu-iotests/group         |   1 +
 28 files changed, 1455 insertions(+), 420 deletions(-)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

-- 
2.20.1



^ permalink raw reply	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 13:28   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 02/27] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 77 +++++++++++++++++++++++++++++--------------
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8982b7b762..617618dc54 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1019,6 +1019,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
                         QCOW2_DISCARD_NEVER);
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure, add
+ * it to @m and the BDRVQcow2State.cluster_allocs list.
+ *
+ * @host_cluster_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs,
+                              uint64_t host_cluster_offset,
+                              uint64_t guest_offset, unsigned bytes,
+                              QCowL2Meta **m, bool keep_old)
+{
+    BDRVQcow2State *s = bs->opaque;
+    unsigned cow_start_from = 0;
+    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+    unsigned cow_end_from = cow_start_to + bytes;
+    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+    QCowL2Meta *old_m = *m;
+
+    *m = g_malloc0(sizeof(**m));
+    **m = (QCowL2Meta) {
+        .next           = old_m,
+
+        .alloc_offset   = host_cluster_offset,
+        .offset         = start_of_cluster(s, guest_offset),
+        .nb_clusters    = nb_clusters,
+
+        .keep_old_clusters = keep_old,
+
+        .cow_start = {
+            .offset     = cow_start_from,
+            .nb_bytes   = cow_start_to - cow_start_from,
+        },
+        .cow_end = {
+            .offset     = cow_end_from,
+            .nb_bytes   = cow_end_to - cow_end_from,
+        },
+    };
+
+    qemu_co_queue_init(&(*m)->dependent_requests);
+    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1417,35 +1467,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
     int avail_bytes = nb_clusters << s->cluster_bits;
     int nb_bytes = MIN(requested_bytes, avail_bytes);
-    QCowL2Meta *old_m = *m;
-
-    *m = g_malloc0(sizeof(**m));
-
-    **m = (QCowL2Meta) {
-        .next           = old_m,
-
-        .alloc_offset   = alloc_cluster_offset,
-        .offset         = start_of_cluster(s, guest_offset),
-        .nb_clusters    = nb_clusters,
-
-        .keep_old_clusters  = keep_old_clusters,
-
-        .cow_start = {
-            .offset     = 0,
-            .nb_bytes   = offset_into_cluster(s, guest_offset),
-        },
-        .cow_end = {
-            .offset     = nb_bytes,
-            .nb_bytes   = avail_bytes - nb_bytes,
-        },
-    };
-    qemu_co_queue_init(&(*m)->dependent_requests);
-    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 
     *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                      m, keep_old_clusters);
+
     return 1;
 
 fail:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 02/27] qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
  2019-12-22 11:36 ` [RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 13:32   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

We are going to need it in other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-cluster.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 617618dc54..e078bddcc2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1069,6 +1069,24 @@ static void calculate_l2_meta(BlockDriverState *bs,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+    switch (qcow2_get_cluster_type(bs, l2_entry)) {
+    case QCOW2_CLUSTER_NORMAL:
+        if (l2_entry & QCOW_OFLAG_COPIED) {
+            return false;
+        }
+    case QCOW2_CLUSTER_UNALLOCATED:
+    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_CLUSTER_ZERO_PLAIN:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        return true;
+    default:
+        abort();
+    }
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1081,25 +1099,11 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
 
     for (i = 0; i < nb_clusters; i++) {
         uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-        switch(cluster_type) {
-        case QCOW2_CLUSTER_NORMAL:
-            if (l2_entry & QCOW_OFLAG_COPIED) {
-                goto out;
-            }
+        if (!cluster_needs_cow(bs, l2_entry)) {
             break;
-        case QCOW2_CLUSTER_UNALLOCATED:
-        case QCOW2_CLUSTER_COMPRESSED:
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
-            break;
-        default:
-            abort();
         }
     }
 
-out:
     assert(i <= nb_clusters);
     return i;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
  2019-12-22 11:36 ` [RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta() Alberto Garcia
  2019-12-22 11:36 ` [RFC PATCH v3 02/27] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 14:53   ` Eric Blake
                     ` (2 more replies)
  2019-12-22 11:36 ` [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
                   ` (24 subsequent siblings)
  27 siblings, 3 replies; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
     allocated and have a reference count of 1. In those clusters we
     can simply write the data and there is no need to perform any
     copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
     either because they haven't been allocated yet, because their
     reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
     which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
     reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
     allocation (all other cases).

One importante difference after this change is that clusters found in
handle_copied() may now require copy-on-write, but this will be anyway
necessary once we add support for subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 226 +++++++++++++++++++++++-------------------
 1 file changed, 126 insertions(+), 100 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e078bddcc2..9387f15866 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1021,13 +1021,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure, add
- * it to @m and the BDRVQcow2State.cluster_allocs list.
+ * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
+ * request does not need copy-on-write or changes to the L2 metadata
+ * then this function does nothing.
  *
  * @host_cluster_offset points to the beginning of the first cluster.
  *
  * @guest_offset and @bytes indicate the offset and length of the
  * request.
  *
+ * @l2_slice contains the L2 entries of all clusters involved in this
+ * write request.
+ *
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
@@ -1035,15 +1040,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
 static void calculate_l2_meta(BlockDriverState *bs,
                               uint64_t host_cluster_offset,
                               uint64_t guest_offset, unsigned bytes,
-                              QCowL2Meta **m, bool keep_old)
+                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned cow_start_from = 0;
+    int l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry;
+    unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
-    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
     unsigned nb_clusters = size_to_clusters(s, cow_end_from);
     QCowL2Meta *old_m = *m;
+    QCow2ClusterType type;
+
+    assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+    /* Return if there's no COW (all clusters are normal and we keep them) */
+    if (keep_old) {
+        int i;
+        for (i = 0; i < nb_clusters; i++) {
+            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+                break;
+            }
+        }
+        if (i == nb_clusters) {
+            return;
+        }
+    }
+
+    /* Get the L2 entry from the first cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_start_from = cow_start_to;
+    } else {
+        cow_start_from = 0;
+    }
+
+    /* Get the L2 entry from the last cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_end_to = cow_end_from;
+    } else {
+        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    }
 
     *m = g_malloc0(sizeof(**m));
     **m = (QCowL2Meta) {
@@ -1069,18 +1112,20 @@ static void calculate_l2_meta(BlockDriverState *bs,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
-/* Returns true if writing to a cluster requires COW */
-static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+/* Returns true if writing to the cluster pointed to by @l2_entry
+ * requires a new allocation (that is, if the cluster is unallocated
+ * or has refcount > 1 and therefore cannot be written in-place). */
+static bool cluster_needs_new_alloc(BlockDriverState *bs, uint64_t l2_entry)
 {
     switch (qcow2_get_cluster_type(bs, l2_entry)) {
     case QCOW2_CLUSTER_NORMAL:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
         if (l2_entry & QCOW_OFLAG_COPIED) {
             return false;
         }
     case QCOW2_CLUSTER_UNALLOCATED:
     case QCOW2_CLUSTER_COMPRESSED:
     case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_ZERO_ALLOC:
         return true;
     default:
         abort();
@@ -1088,20 +1133,36 @@ static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
 }
 
 /*
- * Returns the number of contiguous clusters that can be used for an allocating
- * write, but require COW to be performed (this includes yet unallocated space,
- * which must copy from the backing file)
+ * Returns the number of contiguous clusters that can be written to
+ * using one single write request, starting from @l2_index.
+ * At most @nb_clusters are checked.
+ *
+ * If @new_alloc is true this counts clusters that are either
+ * unallocated, or allocated but with refcount > 1 (so they need to be
+ * newly allocated and COWed).
+ *
+ * If @new_alloc is false this counts clusters that are already
+ * allocated and can be overwritten in-place (this includes clusters
+ * of type QCOW2_CLUSTER_ZERO_ALLOC).
  */
-static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
-    uint64_t *l2_slice, int l2_index)
+static int count_single_write_clusters(BlockDriverState *bs, int nb_clusters,
+                                       uint64_t *l2_slice, int l2_index,
+                                       bool new_alloc)
 {
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        if (!cluster_needs_cow(bs, l2_entry)) {
+        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        if (cluster_needs_new_alloc(bs, l2_entry) != new_alloc) {
             break;
         }
+        if (!new_alloc && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+            break;
+        }
+        expected_offset += s->cluster_size;
     }
 
     assert(i <= nb_clusters);
@@ -1172,10 +1233,10 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 }
 
 /*
- * Checks how many already allocated clusters that don't require a copy on
- * write there are at the given guest_offset (up to *bytes). If *host_offset is
- * not INV_OFFSET, only physically contiguous clusters beginning at this host
- * offset are counted.
+ * Checks how many already allocated clusters that don't require a new
+ * allocation there are at the given guest_offset (up to *bytes).
+ * If *host_offset is not INV_OFFSET, only physically contiguous clusters
+ * beginning at this host offset are counted.
  *
  * Note that guest_offset may not be cluster aligned. In this case, the
  * returned *host_offset points to exact byte referenced by guest_offset and
@@ -1184,12 +1245,12 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
  * Returns:
  *   0:     if no allocated clusters are available at the given offset.
  *          *bytes is normally unchanged. It is set to 0 if the cluster
- *          is allocated and doesn't need COW, but doesn't have the right
- *          physical offset.
+ *          is allocated and can be overwritten in-place but doesn't have
+ *          the right physical offset.
  *
- *   1:     if allocated clusters that don't require a COW are available at
- *          the requested offset. *bytes may have decreased and describes
- *          the length of the area that can be written to.
+ *   1:     if allocated clusters that can be overwritten in place are
+ *          available at the requested offset. *bytes may have decreased
+ *          and describes the length of the area that can be written to.
  *
  *  -errno: in error cases
  */
@@ -1219,7 +1280,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
 
     l2_index = offset_to_l2_slice_index(s, guest_offset);
     nb_clusters = MIN(nb_clusters, s->l2_slice_size - l2_index);
-    assert(nb_clusters <= INT_MAX);
+    /* Limit total byte count to BDRV_REQUEST_MAX_BYTES */
+    nb_clusters = MIN(nb_clusters, BDRV_REQUEST_MAX_BYTES >> s->cluster_bits);
 
     /* Find L2 entry for the first involved cluster */
     ret = get_cluster_table(bs, guest_offset, &l2_slice, &l2_index);
@@ -1229,18 +1291,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
 
     cluster_offset = be64_to_cpu(l2_slice[l2_index]);
 
-    /* Check how many clusters are already allocated and don't need COW */
-    if (qcow2_get_cluster_type(bs, cluster_offset) == QCOW2_CLUSTER_NORMAL
-        && (cluster_offset & QCOW_OFLAG_COPIED))
-    {
+    if (!cluster_needs_new_alloc(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
         bool offset_matches =
             (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
 
         if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
+            qcow2_signal_corruption(bs, true, -1, -1, "%s cluster offset "
                                     "%#llx unaligned (guest offset: %#" PRIx64
-                                    ")", cluster_offset & L2E_OFFSET_MASK,
+                                    ")", cluster_offset & QCOW_OFLAG_ZERO ?
+                                    "Preallocated zero" : "Data",
+                                    cluster_offset & L2E_OFFSET_MASK,
                                     guest_offset);
             ret = -EIO;
             goto out;
@@ -1253,15 +1314,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         }
 
         /* We keep all QCOW_OFLAG_COPIED clusters */
-        keep_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index],
-                                      QCOW_OFLAG_COPIED | QCOW_OFLAG_ZERO);
+        keep_clusters = count_single_write_clusters(bs, nb_clusters, l2_slice,
+                                                    l2_index, false);
         assert(keep_clusters <= nb_clusters);
 
         *bytes = MIN(*bytes,
                  keep_clusters * s->cluster_size
                  - offset_into_cluster(s, guest_offset));
+        assert(*bytes != 0);
+
+        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
+                          *bytes, l2_slice, m, true);
 
         ret = 1;
     } else {
@@ -1337,9 +1400,10 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
 }
 
 /*
- * Allocates new clusters for an area that either is yet unallocated or needs a
- * copy on write. If *host_offset is not INV_OFFSET, clusters are only
- * allocated if the new allocation can match the specified host offset.
+ * Allocates new clusters for an area that either is yet unallocated or
+ * cannot be overwritten in-place. If *host_offset is not INV_OFFSET,
+ * clusters are only allocated if the new allocation can match the specified
+ * host offset.
  *
  * Note that guest_offset may not be cluster aligned. In this case, the
  * returned *host_offset points to exact byte referenced by guest_offset and
@@ -1362,12 +1426,10 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     BDRVQcow2State *s = bs->opaque;
     int l2_index;
     uint64_t *l2_slice;
-    uint64_t entry;
     uint64_t nb_clusters;
     int ret;
-    bool keep_old_clusters = false;
 
-    uint64_t alloc_cluster_offset = INV_OFFSET;
+    uint64_t alloc_cluster_offset;
 
     trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
                              *bytes);
@@ -1382,10 +1444,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
 
     l2_index = offset_to_l2_slice_index(s, guest_offset);
     nb_clusters = MIN(nb_clusters, s->l2_slice_size - l2_index);
-    assert(nb_clusters <= INT_MAX);
-
-    /* Limit total allocation byte count to INT_MAX */
-    nb_clusters = MIN(nb_clusters, INT_MAX >> s->cluster_bits);
+    /* Limit total allocation byte count to BDRV_REQUEST_MAX_BYTES */
+    nb_clusters = MIN(nb_clusters, BDRV_REQUEST_MAX_BYTES >> s->cluster_bits);
 
     /* Find L2 entry for the first involved cluster */
     ret = get_cluster_table(bs, guest_offset, &l2_slice, &l2_index);
@@ -1393,67 +1453,32 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    entry = be64_to_cpu(l2_slice[l2_index]);
-    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index);
+    nb_clusters = count_single_write_clusters(bs, nb_clusters,
+                                              l2_slice, l2_index, true);
 
     /* This function is only called when there were no non-COW clusters, so if
      * we can't find any unallocated or COW clusters either, something is
      * wrong with our code. */
     assert(nb_clusters > 0);
 
-    if (qcow2_get_cluster_type(bs, entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
-        (entry & QCOW_OFLAG_COPIED) &&
-        (*host_offset == INV_OFFSET ||
-         start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
-    {
-        int preallocated_nb_clusters;
-
-        if (offset_into_cluster(s, entry & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Preallocated zero "
-                                    "cluster offset %#llx unaligned (guest "
-                                    "offset: %#" PRIx64 ")",
-                                    entry & L2E_OFFSET_MASK, guest_offset);
-            ret = -EIO;
-            goto fail;
-        }
-
-        /* Try to reuse preallocated zero clusters; contiguous normal clusters
-         * would be fine, too, but count_cow_clusters() above has limited
-         * nb_clusters already to a range of COW clusters */
-        preallocated_nb_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_COPIED);
-        assert(preallocated_nb_clusters > 0);
-
-        nb_clusters = preallocated_nb_clusters;
-        alloc_cluster_offset = entry & L2E_OFFSET_MASK;
-
-        /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
-         * should not free them. */
-        keep_old_clusters = true;
+    /* Allocate at a given offset in the image file */
+    alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
+        start_of_cluster(s, *host_offset);
+    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
+                                  &nb_clusters);
+    if (ret < 0) {
+        goto out;
     }
 
-    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-
-    if (alloc_cluster_offset == INV_OFFSET) {
-        /* Allocate, if necessary at a given offset in the image file */
-        alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
-                               start_of_cluster(s, *host_offset);
-        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
-                                      &nb_clusters);
-        if (ret < 0) {
-            goto fail;
-        }
-
-        /* Can't extend contiguous allocation */
-        if (nb_clusters == 0) {
-            *bytes = 0;
-            return 0;
-        }
-
-        assert(alloc_cluster_offset != INV_OFFSET);
+    /* Can't extend contiguous allocation */
+    if (nb_clusters == 0) {
+        *bytes = 0;
+        ret = 0;
+        goto out;
     }
 
+    assert(alloc_cluster_offset != INV_OFFSET);
+
     /*
      * Save info needed for meta data update.
      *
@@ -1476,13 +1501,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
-                      m, keep_old_clusters);
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
+                      m, false);
 
-    return 1;
+    ret = 1;
 
-fail:
-    if (*m && (*m)->nb_clusters > 0) {
+out:
+    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
+    if (ret < 0 && *m && (*m)->nb_clusters > 0) {
         QLIST_REMOVE(*m, next_in_flight);
     }
     return ret;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (2 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 15:22   ` Eric Blake
  2020-02-20 15:39   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature Alberto Garcia
                   ` (23 subsequent siblings)
  27 siblings, 2 replies; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c  | 65 ++++++++++++++++++++++--------------------
 block/qcow2-refcount.c | 17 +++++------
 block/qcow2.h          | 12 ++++++++
 3 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9387f15866..683c9569ad 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -379,12 +379,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
     QCow2ClusterType first_cluster_type;
     uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t offset = first_entry & mask;
 
     first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -397,7 +398,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
            first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
         if (offset + (uint64_t) i * cluster_size != l2_entry) {
             break;
         }
@@ -413,14 +414,16 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
                                                  int nb_clusters,
                                                  uint64_t *l2_slice,
+                                                 int l2_index,
                                                  QCow2ClusterType wanted_type)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
 
     assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
            wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = be64_to_cpu(l2_slice[i]);
+        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
         QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
         if (type != wanted_type) {
@@ -566,7 +569,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
-    *cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    *cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -601,14 +604,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
         c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  &l2_slice[l2_index], type);
+                                                  l2_slice, l2_index, type);
         *cluster_offset = 0;
         break;
     case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
         /* how many allocated clusters ? */
         c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_ZERO);
+                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
         *cluster_offset &= L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *cluster_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -761,7 +764,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     /* Compression can't overwrite anything. Fail if the cluster was already
      * allocated. */
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
     if (cluster_offset & L2E_OFFSET_MASK) {
         qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
         return -EIO;
@@ -786,7 +789,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-    l2_slice[l2_index] = cpu_to_be64(cluster_offset);
+    set_l2_entry(s, l2_slice, l2_index, cluster_offset);
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
@@ -978,12 +981,12 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
          * cluster the second one has to do RMW (which is done above by
          * perform_cow()), update l2 table with its cluster pointer and free
          * old cluster. This is what this loop does */
-        if (l2_slice[l2_index + i] != 0) {
-            old_cluster[j++] = l2_slice[l2_index + i];
+        if (get_l2_entry(s, l2_slice, l2_index + i) != 0) {
+            old_cluster[j++] = get_l2_entry(s, l2_slice, l2_index + i);
         }
 
-        l2_slice[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+        set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
+                     (cluster_offset + (i << s->cluster_bits)));
      }
 
 
@@ -997,8 +1000,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      */
     if (!m->keep_old_clusters && j != 0) {
         for (i = 0; i < j; i++) {
-            qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1,
-                                    QCOW2_DISCARD_NEVER);
+            qcow2_free_any_clusters(bs, old_cluster[i], 1, QCOW2_DISCARD_NEVER);
         }
     }
 
@@ -1058,7 +1060,7 @@ static void calculate_l2_meta(BlockDriverState *bs,
     if (keep_old) {
         int i;
         for (i = 0; i < nb_clusters; i++) {
-            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
             if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
                 break;
             }
@@ -1069,7 +1071,7 @@ static void calculate_l2_meta(BlockDriverState *bs,
     }
 
     /* Get the L2 entry from the first cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1079,7 +1081,7 @@ static void calculate_l2_meta(BlockDriverState *bs,
     }
 
     /* Get the L2 entry from the last cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1150,12 +1152,12 @@ static int count_single_write_clusters(BlockDriverState *bs, int nb_clusters,
                                        bool new_alloc)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         if (cluster_needs_new_alloc(bs, l2_entry) != new_alloc) {
             break;
         }
@@ -1289,7 +1291,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
     if (!cluster_needs_new_alloc(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
@@ -1670,7 +1672,7 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     for (i = 0; i < nb_clusters; i++) {
         uint64_t old_l2_entry;
 
-        old_l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * If full_discard is false, make sure that a discarded area reads back
@@ -1710,9 +1712,9 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (!full_discard && s->qcow_version >= 3) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
-            l2_slice[l2_index + i] = cpu_to_be64(0);
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
         }
 
         /* Then decrease the refcount */
@@ -1792,7 +1794,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         uint64_t old_offset;
         QCow2ClusterType cluster_type;
 
-        old_offset = be64_to_cpu(l2_slice[l2_index + i]);
+        old_offset = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * Minimize L2 changes if the cluster already reads back as
@@ -1806,10 +1808,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            l2_slice[l2_index + i] |= cpu_to_be64(QCOW_OFLAG_ZERO);
+            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
+            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
         }
     }
 
@@ -1947,7 +1950,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
             }
 
             for (j = 0; j < s->l2_slice_size; j++) {
-                uint64_t l2_entry = be64_to_cpu(l2_slice[j]);
+                uint64_t l2_entry = get_l2_entry(s, l2_slice, j);
                 int64_t offset = l2_entry & L2E_OFFSET_MASK;
                 QCow2ClusterType cluster_type =
                     qcow2_get_cluster_type(bs, l2_entry);
@@ -1961,7 +1964,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                     if (!bs->backing) {
                         /* not backed; therefore we can simply deallocate the
                          * cluster */
-                        l2_slice[j] = 0;
+                        set_l2_entry(s, l2_slice, j, 0);
                         l2_dirty = true;
                         continue;
                     }
@@ -2024,9 +2027,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 }
 
                 if (l2_refcount == 1) {
-                    l2_slice[j] = cpu_to_be64(offset | QCOW_OFLAG_COPIED);
+                    set_l2_entry(s, l2_slice, j, offset | QCOW_OFLAG_COPIED);
                 } else {
-                    l2_slice[j] = cpu_to_be64(offset);
+                    set_l2_entry(s, l2_slice, j, offset);
                 }
                 l2_dirty = true;
             }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index f67ac6b2d8..223048569e 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1309,7 +1309,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     uint64_t cluster_index;
                     uint64_t offset;
 
-                    entry = be64_to_cpu(l2_slice[j]);
+                    entry = get_l2_entry(s, l2_slice, j);
                     old_entry = entry;
                     entry &= ~QCOW_OFLAG_COPIED;
                     offset = entry & L2E_OFFSET_MASK;
@@ -1383,7 +1383,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                             qcow2_cache_set_dependency(bs, s->l2_table_cache,
                                                        s->refcount_block_cache);
                         }
-                        l2_slice[j] = cpu_to_be64(entry);
+                        set_l2_entry(s, l2_slice, j, entry);
                         qcow2_cache_entry_mark_dirty(s->l2_table_cache,
                                                      l2_slice);
                     }
@@ -1616,7 +1616,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
 
     /* Do the actual checks */
     for(i = 0; i < s->l2_size; i++) {
-        l2_entry = be64_to_cpu(l2_table[i]);
+        l2_entry = get_l2_entry(s, l2_table, i);
 
         switch (qcow2_get_cluster_type(bs, l2_entry)) {
         case QCOW2_CLUSTER_COMPRESSED:
@@ -1685,7 +1685,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
-                        l2_table[i] = cpu_to_be64(l2_entry);
+                        set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, sizeof(uint64_t), false);
                         if (ret < 0) {
@@ -1913,7 +1913,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         for (j = 0; j < s->l2_size; j++) {
-            uint64_t l2_entry = be64_to_cpu(l2_table[j]);
+            uint64_t l2_entry = get_l2_entry(s, l2_table, j);
             uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
             QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
 
@@ -1936,9 +1936,10 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                             "l2_entry=%" PRIx64 " refcount=%" PRIu64 "\n",
                             repair ? "Repairing" : "ERROR", l2_entry, refcount);
                     if (repair) {
-                        l2_table[j] = cpu_to_be64(refcount == 1
-                                    ? l2_entry |  QCOW_OFLAG_COPIED
-                                    : l2_entry & ~QCOW_OFLAG_COPIED);
+                        set_l2_entry(s, l2_table, j,
+                                     refcount == 1 ?
+                                     l2_entry |  QCOW_OFLAG_COPIED :
+                                     l2_entry & ~QCOW_OFLAG_COPIED);
                         l2_dirty++;
                     }
                 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 0942126232..6823d3f68f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -495,6 +495,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                    int idx)
+{
+    return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                int idx, uint64_t entry)
+{
+    l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (3 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 14:28   ` Eric Blake
                     ` (2 more replies)
  2019-12-22 11:36 ` [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function Alberto Garcia
                   ` (22 subsequent siblings)
  27 siblings, 3 replies; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
 docs/qcow2-cache.txt   | 19 +++++++++++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..d34261f955 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
                     as the maximum cluster size and won't be able to open images
                     with larger cluster sizes.
 
+                    Note: if the image has Extended L2 Entries then cluster_bits
+                    must be at least 14 (i.e. 16384 byte clusters).
+
          24 - 31:   size
                     Virtual disk size in bytes.
 
@@ -109,7 +112,12 @@ in the description of a field.
                                 An External Data File Name header extension may
                                 be present if this bit is set.
 
-                    Bits 3-63:  Reserved (set to 0)
+                    Bit 3:      Extended L2 Entries.  If this bit is set then
+                                L2 table entries use an extended format that
+                                allows subcluster-based allocation. See the
+                                Extended L2 Entries section for more details.
+
+                    Bits 4-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-    l2_entries = (cluster_size / sizeof(uint64_t))
+    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
 
     l2_index = (offset / cluster_size) % l2_entries
     l1_index = (offset / cluster_size) / l2_entries
@@ -447,6 +455,8 @@ obtained as follows:
 
     return cluster_offset + (offset % cluster_size)
 
+    [*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
     Bit  0 -  8:    Reserved (set to 0)
@@ -487,7 +497,8 @@ Standard Cluster Descriptor:
                     nor is data read from the backing file if the cluster is
                     unallocated.
 
-                    With version 2, this is always 0.
+                    With version 2 or with extended L2 entries (see the next
+                    section), this is always 0.
 
          1 -  8:    Reserved (set to 0)
 
@@ -524,6 +535,57 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated like in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+    Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+                    1: the subcluster is allocated. In this case the
+                       host cluster offset field must contain a valid
+                       offset.
+                    0: the subcluster is not allocated. In this case
+                       read requests shall go to the backing file or
+                       return zeros if there is no backing file data.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 31 - x)
+
+        32 -  63    Subcluster reads as zeros (one bit per subcluster)
+
+                    1: the subcluster reads as zeros. In this case the
+                       allocation status bit must be unset. The host
+                       cluster offset field may or may not be set.
+                    0: no effect.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 63 - x)
+
+Subcluster Allocation Bitmap (for compressed clusters):
+
+    Bit  0 -  63:   Reserved (set to 0)
+                    Compressed clusters don't have subclusters,
+                    so this field is not used.
 
 == Snapshots ==
 
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index d57f409861..04eb4ce2f1 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -1,6 +1,6 @@
 qcow2 L2/refcount cache configuration
 =====================================
-Copyright (C) 2015, 2018 Igalia, S.L.
+Copyright (C) 2015, 2018-2019 Igalia, S.L.
 Author: Alberto Garcia <berto@igalia.com>
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -222,3 +222,20 @@ support this functionality, and is 0 (disabled) on other platforms.
 This functionality currently relies on the MADV_DONTNEED argument for
 madvise() to actually free the memory. This is a Linux-specific feature,
 so cache-clean-interval is not supported on other systems.
+
+
+Extended L2 Entries
+-------------------
+All numbers shown in this document are valid for qcow2 images with normal
+64-bit L2 entries.
+
+Images with extended L2 entries need twice as much L2 metadata, so the L2
+cache size must be twice as large for the same disk space.
+
+   disk_size = l2_cache_size * cluster_size / 16
+
+i.e.
+
+   l2_cache_size = disk_size * 16 / cluster_size
+
+Refcount blocks are not affected by this.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (4 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 15:24   ` Eric Blake
  2020-02-20 16:03   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
                   ` (21 subsequent siblings)
  27 siblings, 2 replies; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 6823d3f68f..1db3fc5dbc 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -495,6 +495,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+    /* FIXME: Return false until this feature is complete */
+    return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (5 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 15:28   ` Eric Blake
  2020-02-20 16:15   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 08/27] qcow2: Add offset_to_sc_index() Alberto Garcia
                   ` (20 subsequent siblings)
  27 siblings, 2 replies; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one,
with subcluster_size = cluster_size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 5 +++++
 block/qcow2.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3866b47946..cbd857e9c7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1378,6 +1378,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         }
     }
 
+    s->subclusters_per_cluster =
+        has_subclusters(s) ? QCOW_MAX_SUBCLUSTERS_PER_CLUSTER : 1;
+    s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+    s->subcluster_bits = ctz32(s->subcluster_size);
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
diff --git a/block/qcow2.h b/block/qcow2.h
index 1db3fc5dbc..941330cfc9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -78,6 +78,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -284,6 +286,9 @@ typedef struct BDRVQcow2State {
     int cluster_bits;
     int cluster_size;
     int l2_slice_size;
+    int subcluster_bits;
+    int subcluster_size;
+    int subclusters_per_cluster;
     int l2_bits;
     int l2_size;
     int l1_size;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 08/27] qcow2: Add offset_to_sc_index()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (6 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 16:19   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 09/27] qcow2: Add l2_entry_size() Alberto Garcia
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 941330cfc9..523bc489a5 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -566,6 +566,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State *s, int64_t offset)
     return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+    return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
     return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 09/27] qcow2: Add l2_entry_size()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (7 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 08/27] qcow2: Add offset_to_sc_index() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 16:24   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c  | 12 ++++++------
 block/qcow2-refcount.c | 14 ++++++++------
 block/qcow2.c          |  8 ++++----
 block/qcow2.h          |  9 +++++++++
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 683c9569ad..851c7e6165 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -209,7 +209,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
                    uint64_t l2_offset, uint64_t **l2_slice)
 {
     BDRVQcow2State *s = bs->opaque;
-    int start_of_slice = sizeof(uint64_t) *
+    int start_of_slice = l2_entry_size(s) *
         (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
     return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -277,7 +277,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new l2 entry */
 
-    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
     if (l2_offset < 0) {
         ret = l2_offset;
         goto fail;
@@ -301,7 +301,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new entry in the l2 cache */
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -365,7 +365,7 @@ fail:
     }
     s->l1_table[l1_index] = old_l2_offset;
     if (l2_offset > 0) {
-        qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+        qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                             QCOW2_DISCARD_ALWAYS);
     }
     return ret;
@@ -708,7 +708,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
 
         /* Then decrease the refcount of the old table */
         if (l2_offset) {
-            qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+            qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                                 QCOW2_DISCARD_OTHER);
         }
 
@@ -1895,7 +1895,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
     int ret;
     int i, j;
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 223048569e..de85ed29a4 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1253,7 +1253,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     l2_slice = NULL;
     l1_table = NULL;
     l1_size2 = l1_size * sizeof(uint64_t);
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     s->cache_discards = true;
@@ -1604,7 +1604,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
     int i, l2_size, nb_csectors, ret;
 
     /* Read L2 table from disk */
-    l2_size = s->l2_size * sizeof(uint64_t);
+    l2_size = s->l2_size * l2_entry_size(s);
     l2_table = g_malloc(l2_size);
 
     ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1679,15 +1679,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                             fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
                             offset);
                     if (fix & BDRV_FIX_ERRORS) {
+                        int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
                         uint64_t l2e_offset =
-                            l2_offset + (uint64_t)i * sizeof(uint64_t);
+                            l2_offset + (uint64_t)i * l2_entry_size(s);
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
                         set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
-                                l2e_offset, sizeof(uint64_t), false);
+                                l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Overlap check failed\n");
                             res->check_errors++;
@@ -1697,7 +1698,8 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         }
 
                         ret = bdrv_pwrite_sync(bs->file, l2e_offset,
-                                               &l2_table[i], sizeof(uint64_t));
+                                               &l2_table[idx],
+                                               l2_entry_size(s));
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Failed to overwrite L2 "
                                     "table entry: %s\n", strerror(-ret));
@@ -1904,7 +1906,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         ret = bdrv_pread(bs->file, l2_offset, l2_table,
-                         s->l2_size * sizeof(uint64_t));
+                         s->l2_size * l2_entry_size(s));
         if (ret < 0) {
             fprintf(stderr, "ERROR: Could not read L2 table: %s\n",
                     strerror(-ret));
diff --git a/block/qcow2.c b/block/qcow2.c
index cbd857e9c7..e7607d90d4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -868,7 +868,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
     /* An L2 table is always one cluster in size so the max cache size
      * should be a multiple of the cluster size. */
-    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
+    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * l2_entry_size(s),
                                      s->cluster_size);
 
     combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
@@ -1029,7 +1029,7 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
         }
     }
 
-    r->l2_slice_size = l2_cache_entry_size / sizeof(uint64_t);
+    r->l2_slice_size = l2_cache_entry_size / l2_entry_size(s);
     r->l2_table_cache = qcow2_cache_create(bs, l2_cache_size,
                                            l2_cache_entry_size);
     r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size,
@@ -1423,7 +1423,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         bs->encrypted = true;
     }
 
-    s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    s->l2_bits = s->cluster_bits - ctz32(l2_entry_size(s));
     s->l2_size = 1 << s->l2_bits;
     /* 2^(s->refcount_order - 3) is the refcount width in bytes */
     s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
@@ -4115,7 +4115,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
          *  preallocation. All that matters is that we will not have to allocate
          *  new refcount structures for them.) */
         nb_new_l2_tables = DIV_ROUND_UP(nb_new_data_clusters,
-                                        s->cluster_size / sizeof(uint64_t));
+                                        s->cluster_size / l2_entry_size(s));
         /* The cluster range may not be aligned to L2 boundaries, so add one L2
          * table for a potential head/tail */
         nb_new_l2_tables++;
diff --git a/block/qcow2.h b/block/qcow2.h
index 523bc489a5..8be020bb76 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,10 @@
 
 #define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
 
+/* Size of normal and extended L2 entries */
+#define L2E_SIZE_NORMAL   (sizeof(uint64_t))
+#define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -506,6 +510,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
     return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+    return has_subclusters(s) ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (8 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 09/27] qcow2: Add l2_entry_size() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 16:27   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 11/27] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters, although the caller
does not need and should ignore the returned value.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 8be020bb76..64b0a814f4 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -518,15 +518,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                     int idx)
+{
+    if (has_subclusters(s)) {
+        idx *= l2_entry_size(s) / sizeof(uint64_t);
+        return be64_to_cpu(l2_slice[idx + 1]);
+    } else {
+        /* For convenience only; the caller should ignore this value. */
+        return 0;
+    }
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                 int idx, uint64_t entry)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                 int idx, uint64_t bitmap)
+{
+    assert(has_subclusters(s));
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
+    l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 11/27] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (9 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-20 17:21   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
     compressed. We do not support compression at the subcluster
     level.

  b) There are two different values for unallocated subclusters:
     QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
     cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
     which means that the cluster is allocated but the subcluster is
     not. The latter can only happen in images with extended L2
     entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
     entry has a value that violates the specification. The caller is
     responsible for handling these situations.

     To prevent compatibility problems with images that have invalid
     values but are currently being read by QEMU without causing side
     effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
     with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 64b0a814f4..321ba9550f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,15 @@
 
 #define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)    ((1ULL << 63) >> (X))
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   ((1ULL << 31) >> (X))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES 0xFFFFFFFF00000000ULL
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  0x00000000FFFFFFFFULL
+
 /* Size of normal and extended L2 entries */
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
@@ -455,6 +464,16 @@ typedef enum QCow2ClusterType {
     QCOW2_CLUSTER_COMPRESSED,
 } QCow2ClusterType;
 
+typedef enum QCow2SubclusterType {
+    QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN,
+    QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC,
+    QCOW2_SUBCLUSTER_ZERO_PLAIN,
+    QCOW2_SUBCLUSTER_ZERO_ALLOC,
+    QCOW2_SUBCLUSTER_NORMAL,
+    QCOW2_SUBCLUSTER_COMPRESSED,
+    QCOW2_SUBCLUSTER_INVALID,
+} QCow2SubclusterType;
+
 typedef enum QCow2MetadataOverlap {
     QCOW2_OL_MAIN_HEADER_BITNR      = 0,
     QCOW2_OL_ACTIVE_L1_BITNR        = 1,
@@ -632,6 +651,79 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs,
     }
 }
 
+/* For an image without extended L2 entries, return the
+ * QCow2SubclusterType equivalent of a given QCow2ClusterType */
+static inline
+QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
+{
+    switch (type) {
+    case QCOW2_CLUSTER_COMPRESSED:
+        return QCOW2_SUBCLUSTER_COMPRESSED;
+    case QCOW2_CLUSTER_ZERO_PLAIN:
+        return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+    case QCOW2_CLUSTER_NORMAL:
+        return QCOW2_SUBCLUSTER_NORMAL;
+    case QCOW2_CLUSTER_UNALLOCATED:
+        return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+/* In an image without subsclusters @l2_bitmap is ignored and
+ * @sc_index must be 0. */
+static inline
+QCow2SubclusterType qcow2_get_subcluster_type(BlockDriverState *bs,
+                                              uint64_t l2_entry,
+                                              uint64_t l2_bitmap,
+                                              unsigned sc_index)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
+    assert(sc_index < s->subclusters_per_cluster);
+
+    if (has_subclusters(s)) {
+        bool sc_zero  = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
+        bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
+        switch (type) {
+        case QCOW2_CLUSTER_COMPRESSED:
+            if (l2_bitmap != 0) {
+                return QCOW2_SUBCLUSTER_INVALID;
+            }
+            return QCOW2_SUBCLUSTER_COMPRESSED;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+            return QCOW2_SUBCLUSTER_INVALID;
+        case QCOW2_CLUSTER_NORMAL:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_SUBCLUSTER_NORMAL;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_SUBCLUSTER_INVALID;
+            }
+        case QCOW2_CLUSTER_UNALLOCATED:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_SUBCLUSTER_INVALID;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_SUBCLUSTER_INVALID;
+            }
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        return qcow2_cluster_to_subcluster_type(type);
+    }
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (10 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 11/27] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-21 11:35   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 13/27] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_cluster_offset() is modified to return the subcluster
type instead of the cluster type, and all callers are updated to
replace all values of QCow2ClusterType with their QCow2SubclusterType
equivalents (as returned by qcow2_cluster_to_subcluster_type()).

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 19 +++++-----
 block/qcow2.c         | 82 +++++++++++++++++++++++++------------------
 block/qcow2.h         |  3 +-
 3 files changed, 60 insertions(+), 44 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 851c7e6165..40c2e34a2a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -497,21 +497,22 @@ static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
 /*
  * get_cluster_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk, find the cluster offset in
+ * the qcow2 file and store it in *cluster_offset.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
- * cluster type and (if applicable) are stored contiguously in the image file.
- * Compressed clusters are always returned one by one.
+ * subcluster type and (if applicable) are stored contiguously in the image
+ * file. The subcluster type is stored in *subcluster_type. Compressed clusters
+ * are always processed one by one.
  *
- * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
- * cases.
+ * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
-                             unsigned int *bytes, uint64_t *cluster_offset)
+                             unsigned int *bytes, uint64_t *cluster_offset,
+                             QCow2SubclusterType *subcluster_type)
 {
     BDRVQcow2State *s = bs->opaque;
     unsigned int l2_index;
@@ -653,7 +654,9 @@ out:
     assert(bytes_available - offset_in_cluster <= UINT_MAX);
     *bytes = bytes_available - offset_in_cluster;
 
-    return type;
+    *subcluster_type = qcow2_cluster_to_subcluster_type(type);
+
+    return 0;
 
 fail:
     qcow2_cache_put(s->l2_table_cache, (void **)&l2_slice);
diff --git a/block/qcow2.c b/block/qcow2.c
index e7607d90d4..9277d680ef 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1964,6 +1964,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     BDRVQcow2State *s = bs->opaque;
     uint64_t cluster_offset;
     unsigned int bytes;
+    QCow2SubclusterType type;
     int ret, status = 0;
 
     qemu_co_mutex_lock(&s->lock);
@@ -1975,7 +1976,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     }
 
     bytes = MIN(INT_MAX, count);
-    ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
+    ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset, &type);
     qemu_co_mutex_unlock(&s->lock);
     if (ret < 0) {
         return ret;
@@ -1983,15 +1984,16 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
 
     *pnum = bytes;
 
-    if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
-        !s->crypto) {
+    if ((type == QCOW2_SUBCLUSTER_NORMAL ||
+         type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
         *map = cluster_offset | offset_into_cluster(s, offset);
         *file = s->data_file->bs;
         status |= BDRV_BLOCK_OFFSET_VALID;
     }
-    if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
+    if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+        type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
-    } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+    } else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
         status |= BDRV_BLOCK_DATA;
     }
     if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2094,7 +2096,7 @@ typedef struct Qcow2AioTask {
     AioTask task;
 
     BlockDriverState *bs;
-    QCow2ClusterType cluster_type; /* only for read */
+    QCow2SubclusterType subcluster_type; /* only for read */
     uint64_t file_cluster_offset;
     uint64_t offset;
     uint64_t bytes;
@@ -2107,7 +2109,7 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task);
 static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
                                        AioTaskPool *pool,
                                        AioTaskFunc func,
-                                       QCow2ClusterType cluster_type,
+                                       QCow2SubclusterType subcluster_type,
                                        uint64_t file_cluster_offset,
                                        uint64_t offset,
                                        uint64_t bytes,
@@ -2121,7 +2123,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
     *task = (Qcow2AioTask) {
         .task.func = func,
         .bs = bs,
-        .cluster_type = cluster_type,
+        .subcluster_type = subcluster_type,
         .qiov = qiov,
         .file_cluster_offset = file_cluster_offset,
         .offset = offset,
@@ -2132,7 +2134,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
 
     trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
                          func == qcow2_co_preadv_task_entry ? "read" : "write",
-                         cluster_type, file_cluster_offset, offset, bytes,
+                         subcluster_type, file_cluster_offset, offset, bytes,
                          qiov, qiov_offset);
 
     if (!pool) {
@@ -2145,7 +2147,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
 }
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
-                                             QCow2ClusterType cluster_type,
+                                             QCow2SubclusterType subc_type,
                                              uint64_t file_cluster_offset,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
@@ -2154,24 +2156,24 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
     BDRVQcow2State *s = bs->opaque;
     int offset_in_cluster = offset_into_cluster(s, offset);
 
-    switch (cluster_type) {
-    case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_ZERO_ALLOC:
+    switch (subc_type) {
+    case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+    case QCOW2_SUBCLUSTER_ZERO_ALLOC:
         /* Both zero types are handled in qcow2_co_preadv_part */
         g_assert_not_reached();
 
-    case QCOW2_CLUSTER_UNALLOCATED:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
         assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
         BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
         return bdrv_co_preadv_part(bs->backing, offset, bytes,
                                    qiov, qiov_offset, 0);
 
-    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_SUBCLUSTER_COMPRESSED:
         return qcow2_co_preadv_compressed(bs, file_cluster_offset,
                                           offset, bytes, qiov, qiov_offset);
 
-    case QCOW2_CLUSTER_NORMAL:
+    case QCOW2_SUBCLUSTER_NORMAL:
         if ((file_cluster_offset & 511) != 0) {
             return -EIO;
         }
@@ -2199,8 +2201,9 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
 
     assert(!t->l2meta);
 
-    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->file_cluster_offset,
-                                t->offset, t->bytes, t->qiov, t->qiov_offset);
+    return qcow2_co_preadv_task(t->bs, t->subcluster_type,
+                                t->file_cluster_offset, t->offset, t->bytes,
+                                t->qiov, t->qiov_offset);
 }
 
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
@@ -2212,6 +2215,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
     int ret = 0;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     uint64_t cluster_offset = 0;
+    QCow2SubclusterType type;
     AioTaskPool *aio = NULL;
 
     while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2223,22 +2227,23 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         }
 
         qemu_co_mutex_lock(&s->lock);
-        ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
+        ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes,
+                                       &cluster_offset, &type);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
             goto out;
         }
 
-        if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
-            ret == QCOW2_CLUSTER_ZERO_ALLOC ||
-            (ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
+        if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+            type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+            (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
             if (!aio && cur_bytes != bytes) {
                 aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
             }
-            ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, ret,
+            ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, type,
                                  cluster_offset, offset, cur_bytes,
                                  qiov, qiov_offset, NULL);
             if (ret < 0) {
@@ -2469,7 +2474,7 @@ static coroutine_fn int qcow2_co_pwritev_task_entry(AioTask *task)
 {
     Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
 
-    assert(!t->cluster_type);
+    assert(!t->subcluster_type);
 
     return qcow2_co_pwritev_task(t->bs, t->file_cluster_offset,
                                  t->offset, t->bytes, t->qiov, t->qiov_offset,
@@ -3723,6 +3728,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     if (head || tail) {
         uint64_t off;
         unsigned int nr;
+        QCow2SubclusterType type;
 
         assert(head + bytes <= s->cluster_size);
 
@@ -3738,10 +3744,14 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
         bytes = s->cluster_size;
         nr = s->cluster_size;
-        ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
-        if (ret != QCOW2_CLUSTER_UNALLOCATED &&
-            ret != QCOW2_CLUSTER_ZERO_PLAIN &&
-            ret != QCOW2_CLUSTER_ZERO_ALLOC) {
+        ret = qcow2_get_cluster_offset(bs, offset, &nr, &off, &type);
+        if (ret < 0) {
+            qemu_co_mutex_unlock(&s->lock);
+            return -ENOTSUP;
+        }
+        if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+            type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
+            type != QCOW2_SUBCLUSTER_ZERO_ALLOC) {
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
@@ -3799,17 +3809,19 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
     while (bytes != 0) {
         uint64_t copy_offset = 0;
+        QCow2SubclusterType type;
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
         cur_write_flags = write_flags;
 
-        ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, &copy_offset);
+        ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes,
+                                       &copy_offset, &type);
         if (ret < 0) {
             goto out;
         }
 
-        switch (ret) {
-        case QCOW2_CLUSTER_UNALLOCATED:
+        switch (type) {
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
@@ -3824,16 +3836,16 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
             }
             break;
 
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
             cur_write_flags |= BDRV_REQ_ZERO_WRITE;
             break;
 
-        case QCOW2_CLUSTER_COMPRESSED:
+        case QCOW2_SUBCLUSTER_COMPRESSED:
             ret = -ENOTSUP;
             goto out;
 
-        case QCOW2_CLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_NORMAL:
             child = s->data_file;
             copy_offset += offset_into_cluster(s, src_offset);
             if ((copy_offset & 511) != 0) {
diff --git a/block/qcow2.h b/block/qcow2.h
index 321ba9550f..37b7e25989 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -831,7 +831,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
                           uint8_t *buf, int nb_sectors, bool enc, Error **errp);
 
 int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
-                             unsigned int *bytes, uint64_t *cluster_offset);
+                             unsigned int *bytes, uint64_t *cluster_offset,
+                             QCow2SubclusterType *subcluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
                                unsigned int *bytes, uint64_t *host_offset,
                                QCowL2Meta **m);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 13/27] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (11 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-21 12:02   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 14/27] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 9277d680ef..1d3da0ccf6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1985,7 +1985,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     *pnum = bytes;
 
     if ((type == QCOW2_SUBCLUSTER_NORMAL ||
-         type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
+         type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+         type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) && !s->crypto) {
         *map = cluster_offset | offset_into_cluster(s, offset);
         *file = s->data_file->bs;
         status |= BDRV_BLOCK_OFFSET_VALID;
@@ -1993,7 +1994,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
         type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
-    } else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
+    } else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+               type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) {
         status |= BDRV_BLOCK_DATA;
     }
     if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2163,6 +2165,7 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
         g_assert_not_reached();
 
     case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
         assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
         BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2236,7 +2239,8 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
 
         if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
             type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
-            (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
+            (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing) ||
+            (type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC && !bs->backing))
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
@@ -3750,6 +3754,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
             return -ENOTSUP;
         }
         if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+            type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
             type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
             type != QCOW2_SUBCLUSTER_ZERO_ALLOC) {
             qemu_co_mutex_unlock(&s->lock);
@@ -3822,6 +3827,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
         switch (type) {
         case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 14/27] qcow2: Add subcluster support to calculate_l2_meta()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (12 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 13/27] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-21 13:34   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 15/27] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
                   ` (13 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

   - If the cluster is new, then subclusters #0 to #3 from the old
     cluster must be copied into the new one.

   - If the cluster is new but the old cluster was unallocated, then
     only subcluster #3 needs copy-on-write. #0 to #2 are marked as
     unallocated in the bitmap of the new L2 entry.

   - If we are overwriting an old cluster and subcluster #3 is
     unallocated or has the all-zeroes bit set then we need
     copy-on-write on subcluster #3.

   - If we are overwriting an old cluster and subcluster #3 was
     allocated then there is no need to copy-on-write.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 140 +++++++++++++++++++++++++++++++++---------
 1 file changed, 110 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 40c2e34a2a..c6eb480ee8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1041,56 +1041,128 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 1 on success, -errno on failure (in order to match the
+ * return value of handle_copied() and handle_alloc()).
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-                              uint64_t host_cluster_offset,
-                              uint64_t guest_offset, unsigned bytes,
-                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_cluster_offset,
+                             uint64_t guest_offset, unsigned bytes,
+                             uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    int l2_index = offset_to_l2_slice_index(s, guest_offset);
-    uint64_t l2_entry;
+    int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry, l2_bitmap;
     unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
     unsigned nb_clusters = size_to_clusters(s, cow_end_from);
     QCowL2Meta *old_m = *m;
-    QCow2ClusterType type;
+    QCow2SubclusterType type;
 
     assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-    /* Return if there's no COW (all clusters are normal and we keep them) */
+    /* Return if there's no COW (all subclusters are normal and we are
+     * keeping the clusters) */
     if (keep_old) {
+        unsigned first_sc = cow_start_to / s->subcluster_size;
+        unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
         int i;
-        for (i = 0; i < nb_clusters; i++) {
-            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+        for (i = first_sc; i <= last_sc; i++) {
+            unsigned c = i / s->subclusters_per_cluster;
+            unsigned sc = i % s->subclusters_per_cluster;
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+            l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+            type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+            if (type == QCOW2_SUBCLUSTER_INVALID) {
+                l2_index += c; /* Point to the invalid entry */
+                goto fail;
+            }
+            if (type != QCOW2_SUBCLUSTER_NORMAL) {
                 break;
             }
         }
-        if (i == nb_clusters) {
-            return;
+        if (i == last_sc + 1) {
+            return 1;
         }
     }
 
     /* Get the L2 entry from the first cluster */
     l2_entry = get_l2_entry(s, l2_slice, l2_index);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_start_from = cow_start_to;
+    if (type == QCOW2_SUBCLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_COMPRESSED:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_start_from = 0;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_start_from = 0;
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+            cow_start_from = cow_start_to;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     /* Get the L2 entry from the last cluster */
-    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_index += nb_clusters - 1;
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset + bytes - 1);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_end_to = cow_end_from;
+    if (type == QCOW2_SUBCLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_COMPRESSED:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+            cow_end_to = cow_end_from;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     *m = g_malloc0(sizeof(**m));
@@ -1115,6 +1187,18 @@ static void calculate_l2_meta(BlockDriverState *bs,
 
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+
+fail:
+    if (type == QCOW2_SUBCLUSTER_INVALID) {
+        uint64_t l1_index = offset_to_l1_index(s, guest_offset);
+        uint64_t l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        return -EIO;
+    }
+
+    return 1;
 }
 
 /* Returns true if writing to the cluster pointed to by @l2_entry
@@ -1328,10 +1412,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
                  - offset_into_cluster(s, guest_offset));
         assert(*bytes != 0);
 
-        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
-                          *bytes, l2_slice, m, true);
-
-        ret = 1;
+        ret = calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK,
+                                guest_offset, *bytes, l2_slice, m, true);
     } else {
         ret = 0;
     }
@@ -1506,10 +1588,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
-                      m, false);
-
-    ret = 1;
+    ret = calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                            l2_slice, m, false);
 
 out:
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 15/27] qcow2: Add subcluster support to qcow2_get_cluster_offset()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (13 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 14/27] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-21 14:21   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 16/27] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 136 ++++++++++++++++++++----------------------
 block/qcow2.h         |  36 +++++------
 2 files changed, 80 insertions(+), 92 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c6eb480ee8..c10601a828 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -372,66 +372,55 @@ fail:
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. Allocated subclusters are required to be contiguous in
+ * the image file.
+ * At most @nb_clusters are checked (note that this means clusters,
+ * not subclusters).
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+                                        unsigned sc_index, uint64_t *l2_slice,
+                                        int l2_index)
 {
     BDRVQcow2State *s = bs->opaque;
-    int i;
-    QCow2ClusterType first_cluster_type;
-    uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-    uint64_t offset = first_entry & mask;
-
-    first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-    if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-        return 0;
+    int i, j, count = 0;
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
+    bool check_offset = true;
+    QCow2SubclusterType type =
+        qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+
+    assert(type != QCOW2_SUBCLUSTER_INVALID); /* The caller should check this */
+
+    if (type == QCOW2_SUBCLUSTER_COMPRESSED) {
+        /* Compressed clusters are always processed one by one */
+        return s->subclusters_per_cluster - sc_index;
     }
 
-    /* must be allocated */
-    assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-           first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
-
-    for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-        if (offset + (uint64_t) i * cluster_size != l2_entry) {
-            break;
-        }
+    if (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN ||
+        type == QCOW2_SUBCLUSTER_ZERO_PLAIN) {
+        check_offset = false;
     }
 
-        return i;
-}
-
-/*
- * Checks how many consecutive unallocated clusters in a given L2
- * slice have the same cluster type.
- */
-static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
-                                                 int nb_clusters,
-                                                 uint64_t *l2_slice,
-                                                 int l2_index,
-                                                 QCow2ClusterType wanted_type)
-{
-    BDRVQcow2State *s = bs->opaque;
-    int i;
-
-    assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-           wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-        QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
-
-        if (type != wanted_type) {
-            break;
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+        l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+        if (check_offset && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+            goto out;
+        }
+        for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) {
+            if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) {
+                goto out;
+            }
+            count++;
         }
+        expected_offset += s->cluster_size;
     }
 
-    return i;
+out:
+    return count;
 }
 
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
@@ -515,12 +504,12 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
                              QCow2SubclusterType *subcluster_type)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned int l2_index;
-    uint64_t l1_index, l2_offset, *l2_slice;
-    int c;
+    unsigned int l2_index, sc_index;
+    uint64_t l1_index, l2_offset, *l2_slice, l2_bitmap;
+    int sc;
     unsigned int offset_in_cluster;
     uint64_t bytes_available, bytes_needed, nb_clusters;
-    QCow2ClusterType type;
+    QCow2SubclusterType type;
     int ret;
 
     offset_in_cluster = offset_into_cluster(s, offset);
@@ -570,7 +559,9 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
+    sc_index = offset_to_sc_index(s, offset);
     *cluster_offset = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -578,9 +569,9 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
      * true */
     assert(nb_clusters <= INT_MAX);
 
-    type = qcow2_get_cluster_type(bs, *cluster_offset);
-    if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
-                                type == QCOW2_CLUSTER_ZERO_ALLOC)) {
+    type = qcow2_get_subcluster_type(bs, *cluster_offset, l2_bitmap, sc_index);
+    if (s->qcow_version < 3 && (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+                                type == QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
         qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
                                 " in pre-v3 image (L2 offset: %#" PRIx64
                                 ", L2 index: %#x)", l2_offset, l2_index);
@@ -588,7 +579,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         goto fail;
     }
     switch (type) {
-    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_SUBCLUSTER_INVALID:
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        ret = -EIO;
+        goto fail;
+    case QCOW2_SUBCLUSTER_COMPRESSED:
         if (has_data_file(bs)) {
             qcow2_signal_corruption(bs, true, -1, -1, "Compressed cluster "
                                     "entry found in image with external data "
@@ -598,21 +595,20 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
         /* Compressed clusters can only be processed one by one */
-        c = 1;
+        sc = s->subclusters_per_cluster - sc_index;
         *cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
         break;
-    case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_UNALLOCATED:
-        /* how many empty clusters ? */
-        c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  l2_slice, l2_index, type);
+    case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                          l2_slice, l2_index);
         *cluster_offset = 0;
         break;
-    case QCOW2_CLUSTER_ZERO_ALLOC:
-    case QCOW2_CLUSTER_NORMAL:
-        /* how many allocated clusters ? */
-        c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
+    case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+    case QCOW2_SUBCLUSTER_NORMAL:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                          l2_slice, l2_index);
         *cluster_offset &= L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *cluster_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -641,7 +637,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
-    bytes_available = (int64_t)c * s->cluster_size;
+    bytes_available = ((int64_t)sc + sc_index) << s->subcluster_bits;
 
 out:
     if (bytes_available > bytes_needed) {
@@ -654,7 +650,7 @@ out:
     assert(bytes_available - offset_in_cluster <= UINT_MAX);
     *bytes = bytes_available - offset_in_cluster;
 
-    *subcluster_type = qcow2_cluster_to_subcluster_type(type);
+    *subcluster_type = type;
 
     return 0;
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 37b7e25989..ae7973a2c2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -651,27 +651,6 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs,
     }
 }
 
-/* For an image without extended L2 entries, return the
- * QCow2SubclusterType equivalent of a given QCow2ClusterType */
-static inline
-QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
-{
-    switch (type) {
-    case QCOW2_CLUSTER_COMPRESSED:
-        return QCOW2_SUBCLUSTER_COMPRESSED;
-    case QCOW2_CLUSTER_ZERO_PLAIN:
-        return QCOW2_SUBCLUSTER_ZERO_PLAIN;
-    case QCOW2_CLUSTER_ZERO_ALLOC:
-        return QCOW2_SUBCLUSTER_ZERO_ALLOC;
-    case QCOW2_CLUSTER_NORMAL:
-        return QCOW2_SUBCLUSTER_NORMAL;
-    case QCOW2_CLUSTER_UNALLOCATED:
-        return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
-    default:
-        g_assert_not_reached();
-    }
-}
-
 /* In an image without subsclusters @l2_bitmap is ignored and
  * @sc_index must be 0. */
 static inline
@@ -720,7 +699,20 @@ QCow2SubclusterType qcow2_get_subcluster_type(BlockDriverState *bs,
             g_assert_not_reached();
         }
     } else {
-        return qcow2_cluster_to_subcluster_type(type);
+        switch (type) {
+        case QCOW2_CLUSTER_COMPRESSED:
+            return QCOW2_SUBCLUSTER_COMPRESSED;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+            return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+            return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+        case QCOW2_CLUSTER_NORMAL:
+            return QCOW2_SUBCLUSTER_NORMAL;
+        case QCOW2_CLUSTER_UNALLOCATED:
+            return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+        default:
+            g_assert_not_reached();
+        }
     }
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 16/27] qcow2: Add subcluster support to zero_in_l2_slice()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (14 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 15/27] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-21 14:37   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 17/27] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c10601a828..70b0aaa00a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1870,7 +1870,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     assert(nb_clusters <= INT_MAX);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t old_offset;
+        uint64_t old_offset, l2_entry = 0;
         QCow2ClusterType cluster_type;
 
         old_offset = get_l2_entry(s, l2_slice, l2_index + i);
@@ -1887,12 +1887,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         }
+
+        if (has_subclusters(s)) {
+            set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES);
+        } else {
+            l2_entry |= QCOW_OFLAG_ZERO;
+        }
+
+        set_l2_entry(s, l2_slice, l2_index + i, l2_entry);
     }
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 17/27] qcow2: Add subcluster support to discard_in_l2_slice()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (15 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 16/27] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-21 14:45   ` Max Reitz
  2019-12-22 11:36 ` [RFC PATCH v3 18/27] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 70b0aaa00a..207f670c94 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1790,7 +1790,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-        if (!full_discard && s->qcow_version >= 3) {
+        if (has_subclusters(s)) {
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
+            set_l2_bitmap(s, l2_slice, l2_index + i,
+                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
+        } else if (!full_discard && s->qcow_version >= 3) {
             set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
             set_l2_entry(s, l2_slice, l2_index + i, 0);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 18/27] qcow2: Add subcluster support to check_refcounts_l2()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (16 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 17/27] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
@ 2019-12-22 11:36 ` Alberto Garcia
  2020-02-21 14:47   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-refcount.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index de85ed29a4..65f4fc27c3 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1685,8 +1685,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
-                        l2_entry = QCOW_OFLAG_ZERO;
-                        set_l2_entry(s, l2_table, i, l2_entry);
+                        if (has_subclusters(s)) {
+                            set_l2_entry(s, l2_table, i, 0);
+                            set_l2_bitmap(s, l2_table, i,
+                                          QCOW_L2_BITMAP_ALL_ZEROES);
+                        } else {
+                            set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+                        }
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (17 preceding siblings ...)
  2019-12-22 11:36 ` [RFC PATCH v3 18/27] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 14:57   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 20/27] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Two changes are needed in order to add subcluster support to this
function: deallocated clusters must have their bitmaps cleared, and
expanded clusters must have all the "subcluster allocated" bits set.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 207f670c94..ede75138d2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2054,6 +2054,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                         /* not backed; therefore we can simply deallocate the
                          * cluster */
                         set_l2_entry(s, l2_slice, j, 0);
+                        if (has_subclusters(s)) {
+                            set_l2_bitmap(s, l2_slice, j, 0);
+                        }
                         l2_dirty = true;
                         continue;
                     }
@@ -2120,6 +2123,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 } else {
                     set_l2_entry(s, l2_slice, j, offset);
                 }
+                if (has_subclusters(s)) {
+                    set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
+                }
                 l2_dirty = true;
             }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 20/27] qcow2: Fix offset calculation in handle_dependencies()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (18 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 15:01   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 21/27] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
                   ` (7 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

l2meta_cow_start() and l2meta_cow_end() are not necessarily
cluster-aligned if the image has subclusters, so update the
calculation of old_start and old_end to guarantee that no two requests
try to write on the same cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ede75138d2..0a40944667 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1279,8 +1279,8 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 
         uint64_t start = guest_offset;
         uint64_t end = start + bytes;
-        uint64_t old_start = l2meta_cow_start(old_alloc);
-        uint64_t old_end = l2meta_cow_end(old_alloc);
+        uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
+        uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), s->cluster_size);
 
         if (end <= old_start || start >= old_end) {
             /* No intersection */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 21/27] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (19 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 20/27] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 15:43   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 22/27] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated.

This needs to happen even if the cluster was already allocated and the
L2 entry was otherwise valid.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0a40944667..ed291a4042 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -986,6 +986,23 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 
         set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
                      (cluster_offset + (i << s->cluster_bits)));
+
+        /* Update bitmap with the subclusters that were just written */
+        if (has_subclusters(s)) {
+            unsigned written_from = m->cow_start.offset;
+            unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+                m->nb_clusters << s->cluster_bits;
+            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+            int sc;
+            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
+                int sc_off = i * s->cluster_size + sc * s->subcluster_size;
+                if (sc_off >= written_from && sc_off < written_to) {
+                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
+                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
+                }
+            }
+            set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+        }
      }
 
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 22/27] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (20 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 21/27] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 15:46   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 23/27] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ed291a4042..af0f01621c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -789,6 +789,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
     set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+    if (has_subclusters(s)) {
+        set_l2_bitmap(s, l2_slice, l2_index, 0);
+    }
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 23/27] qcow2: Add subcluster support to handle_alloc_space()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (21 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 22/27] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 15:56   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 24/27] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 1d3da0ccf6..242001afa2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2354,6 +2354,9 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 
     for (m = l2meta; m != NULL; m = m->next) {
         int ret;
+        uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+        unsigned nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+            m->cow_start.offset;
 
         if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
             continue;
@@ -2368,16 +2371,14 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
          * efficiently zero out the whole clusters
          */
 
-        ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-                                            m->nb_clusters * s->cluster_size,
+        ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
                                             true);
         if (ret < 0) {
             return ret;
         }
 
         BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-        ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-                                    m->nb_clusters * s->cluster_size,
+        ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
                                     BDRV_REQ_NO_FALLBACK);
         if (ret < 0) {
             if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 24/27] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (22 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 23/27] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 16:02   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Ideally it should be possible to zero individual subclusters using
this function, but this is currently not implemented.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 242001afa2..0267722065 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3754,6 +3754,12 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
+        /* TODO: allow zeroing separate subclusters, we only allow
+         * zeroing full clusters at the moment. */
+        if (nr != bytes) {
+            qemu_co_mutex_unlock(&s->lock);
+            return -ENOTSUP;
+        }
         if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
             type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
             type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (23 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 24/27] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-20 14:12   ` Eric Blake
  2020-02-21 16:44   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
                   ` (2 subsequent siblings)
  27 siblings, 2 replies; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c                    |  65 ++++++++++++++++++--
 block/qcow2.h                    |   8 ++-
 include/block/block_int.h        |   1 +
 qapi/block-core.json             |   7 +++
 tests/qemu-iotests/031.out       |   8 +--
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 +++++++++++++++----------------
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  20 +++---
 tests/qemu-iotests/065           |  18 ++++--
 tests/qemu-iotests/082.out       |  48 ++++++++++++---
 tests/qemu-iotests/085.out       |  38 ++++++------
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +--
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   4 ++
 tests/qemu-iotests/242.out       |   5 ++
 tests/qemu-iotests/255.out       |   8 +--
 tests/qemu-iotests/273.out       |   9 ++-
 tests/qemu-iotests/common.filter |   1 +
 21 files changed, 245 insertions(+), 118 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0267722065..4f26953b1e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1383,6 +1383,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
     s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
     s->subcluster_bits = ctz32(s->subcluster_size);
 
+    if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
+        error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size);
+        ret = -EINVAL;
+        goto fail;
+    }
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
@@ -2856,6 +2862,11 @@ int qcow2_update_header(BlockDriverState *bs)
                 .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
                 .name = "lazy refcounts",
             },
+            {
+                .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+                .bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
+                .name = "extended L2 entries",
+            },
         };
 
         ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
@@ -3184,7 +3195,8 @@ static int64_t qcow2_calc_prealloc_size(int64_t total_size,
     return meta_size + aligned_total_size;
 }
 
-static bool validate_cluster_size(size_t cluster_size, Error **errp)
+static bool validate_cluster_size(size_t cluster_size, bool extended_l2,
+                                  Error **errp)
 {
     int cluster_bits = ctz32(cluster_size);
     if (cluster_bits < MIN_CLUSTER_BITS || cluster_bits > MAX_CLUSTER_BITS ||
@@ -3194,16 +3206,28 @@ static bool validate_cluster_size(size_t cluster_size, Error **errp)
                    "%dk", 1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
         return false;
     }
+
+    if (extended_l2) {
+        unsigned min_cluster_size =
+            (1 << MIN_CLUSTER_BITS) * QCOW_MAX_SUBCLUSTERS_PER_CLUSTER;
+        if (cluster_size < min_cluster_size) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "cluster sizes of at least %u bytes", min_cluster_size);
+            return false;
+        }
+    }
+
     return true;
 }
 
-static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
+static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, bool extended_l2,
+                                             Error **errp)
 {
     size_t cluster_size;
 
     cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
                                          DEFAULT_CLUSTER_SIZE);
-    if (!validate_cluster_size(cluster_size, errp)) {
+    if (!validate_cluster_size(cluster_size, extended_l2, errp)) {
         return 0;
     }
     return cluster_size;
@@ -3316,7 +3340,20 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         cluster_size = DEFAULT_CLUSTER_SIZE;
     }
 
-    if (!validate_cluster_size(cluster_size, errp)) {
+    if (!qcow2_opts->has_extended_l2) {
+        qcow2_opts->extended_l2 = false;
+    }
+    if (qcow2_opts->extended_l2) {
+        if (version < 3) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "compatibility level 1.1 and above (use version=v3 or "
+                       "greater)");
+            ret = -EINVAL;
+            goto out;
+        }
+    }
+
+    if (!validate_cluster_size(cluster_size, qcow2_opts->extended_l2, errp)) {
         ret = -EINVAL;
         goto out;
     }
@@ -3436,6 +3473,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
             cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
     }
 
+    if (qcow2_opts->extended_l2) {
+        header->incompatible_features |=
+            cpu_to_be64(QCOW2_INCOMPAT_EXTL2);
+    }
+
     ret = blk_pwrite(blk, 0, header, cluster_size, 0);
     g_free(header);
     if (ret < 0) {
@@ -3614,6 +3656,7 @@ static int coroutine_fn qcow2_co_create_opts(const char *filename, QemuOpts *opt
         { BLOCK_OPT_BACKING_FMT,        "backing-fmt" },
         { BLOCK_OPT_CLUSTER_SIZE,       "cluster-size" },
         { BLOCK_OPT_LAZY_REFCOUNTS,     "lazy-refcounts" },
+        { BLOCK_OPT_EXTL2,              "extended-l2" },
         { BLOCK_OPT_REFCOUNT_BITS,      "refcount-bits" },
         { BLOCK_OPT_ENCRYPT,            BLOCK_OPT_ENCRYPT_FORMAT },
         { BLOCK_OPT_COMPAT_LEVEL,       "version" },
@@ -4660,9 +4703,13 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     PreallocMode prealloc;
     bool has_backing_file;
     bool has_luks;
+    bool extended_l2;
 
     /* Parse image creation options */
-    cluster_size = qcow2_opt_get_cluster_size_del(opts, &local_err);
+    extended_l2 = qemu_opt_get_bool_del(opts, BLOCK_OPT_EXTL2, false);
+
+    cluster_size = qcow2_opt_get_cluster_size_del(opts, extended_l2,
+                                                  &local_err);
     if (local_err) {
         goto err;
     }
@@ -4838,6 +4885,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
             .corrupt            = s->incompatible_features &
                                   QCOW2_INCOMPAT_CORRUPT,
             .has_corrupt        = true,
+            .has_extended_l2    = true,
+            .extended_l2        = has_subclusters(s),
             .refcount_bits      = s->refcount_bits,
             .has_bitmaps        = !!bitmaps,
             .bitmaps            = bitmaps,
@@ -5496,6 +5545,12 @@ static QemuOptsList qcow2_create_opts = {
             .help = "Postpone refcount updates",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_EXTL2,
+            .type = QEMU_OPT_BOOL,
+            .help = "Extended L2 tables",
+            .def_value_str = "off"
+        },
         {
             .name = BLOCK_OPT_REFCOUNT_BITS,
             .type = QEMU_OPT_NUMBER,
diff --git a/block/qcow2.h b/block/qcow2.h
index ae7973a2c2..e879d54b05 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -231,13 +231,16 @@ enum {
     QCOW2_INCOMPAT_DIRTY_BITNR      = 0,
     QCOW2_INCOMPAT_CORRUPT_BITNR    = 1,
     QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+    QCOW2_INCOMPAT_EXTL2_BITNR      = 3,
     QCOW2_INCOMPAT_DIRTY            = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
     QCOW2_INCOMPAT_CORRUPT          = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
     QCOW2_INCOMPAT_DATA_FILE        = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
+    QCOW2_INCOMPAT_EXTL2            = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
     QCOW2_INCOMPAT_MASK             = QCOW2_INCOMPAT_DIRTY
                                     | QCOW2_INCOMPAT_CORRUPT
-                                    | QCOW2_INCOMPAT_DATA_FILE,
+                                    | QCOW2_INCOMPAT_DATA_FILE
+                                    | QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -525,8 +528,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-    /* FIXME: Return false until this feature is complete */
-    return false;
+    return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index dd033d0b37..e8fbfa6dc8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,7 @@
 #define BLOCK_OPT_REFCOUNT_BITS     "refcount_bits"
 #define BLOCK_OPT_DATA_FILE         "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW     "data_file_raw"
+#define BLOCK_OPT_EXTL2             "extended_l2"
 
 #define BLOCK_PROBE_BUF_SIZE        512
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0cf68fea14..e366eba2f2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -66,6 +66,9 @@
 #                 standalone (read-only) raw image without looking at qcow2
 #                 metadata (since: 4.0)
 #
+# @extended-l2: true if the image has extended L2 entries; only valid for
+#               compat >= 1.1 (since 4.2)
+#
 # @lazy-refcounts: on or off; only valid for compat >= 1.1
 #
 # @corrupt: true if the image has been marked corrupt; only valid for
@@ -85,6 +88,7 @@
       'compat': 'str',
       '*data-file': 'str',
       '*data-file-raw': 'bool',
+      '*extended-l2': 'bool',
       '*lazy-refcounts': 'bool',
       '*corrupt': 'bool',
       'refcount-bits': 'int',
@@ -4372,6 +4376,8 @@
 # @data-file-raw    True if the external data file must stay valid as a
 #                   standalone (read-only) raw image without looking at qcow2
 #                   metadata (default: false; since: 4.0)
+# @extended-l2      True to make the image have extended L2 entries
+#                   (default: false; since 4.2)
 # @size             Size of the virtual disk in bytes
 # @version          Compatibility level (default: v3)
 # @backing-file     File name of the backing file if a backing file
@@ -4390,6 +4396,7 @@
   'data': { 'file':             'BlockdevRef',
             '*data-file':       'BlockdevRef',
             '*data-file-raw':   'bool',
+            '*extended-l2':     'bool',
             'size':             'size',
             '*version':         'BlockdevQcow2Version',
             '*backing-file':    'str',
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index 68a74d03b9..614950be56 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -117,7 +117,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -150,7 +150,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -164,7 +164,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   3
-backing_file_offset       0x178
+backing_file_offset       0x1a8
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -188,7 +188,7 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index e489b44386..c7e6512b43 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -58,7 +58,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 
@@ -86,7 +86,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 *** done
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 6b505408dd..ba4b42fd58 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 2. Specifying size via -o ==
 
 qemu-img create -f qcow2 -o size=1024 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 3. Invalid sizes ==
 
@@ -124,84 +124,84 @@ and exabytes, respectively.
 == Check correct interpretation of suffixes for cluster size ==
 
 qemu-img create -f qcow2 -o cluster_size=1024 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check compat level option ==
 
 qemu-img create -f qcow2 -o compat=0.10 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.42 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '0.42'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=foobar TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter 'foobar'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check preallocation option ==
 
 qemu-img create -f qcow2 -o preallocation=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=metadata TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=1234 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '1234'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check encryption option ==
 
 qemu-img create -f qcow2 -o encryption=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 --object secret,id=sec0,data=123456 -o encryption=on,encrypt.key-secret=sec0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check lazy_refcounts option (only with v3) ==
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=on TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Lazy refcounts only supported with compatibility level 1.1 and above (use version=v3 or greater)
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 *** done
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 0f6b0658a1..1f6ae50027 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -20,6 +20,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: true
+    extended l2: false
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot be opened read/write
 no file open, try 'help open'
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index d6a7c2af95..96a6933554 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -26,7 +26,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -84,7 +84,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -140,7 +140,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -195,7 +195,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -264,7 +264,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 65536/65536 bytes at offset 44040192
@@ -298,7 +298,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -327,7 +327,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 131072/131072 bytes at offset 0
@@ -496,6 +496,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 
 === Try changing the external data file ===
@@ -516,6 +517,7 @@ Format specific information:
     data file: foo
     data file raw: false
     corrupt: false
+    extended l2: false
 
 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': 'data-file' is required for this image
 image: TEST_DIR/t.IMGFMT
@@ -528,6 +530,7 @@ Format specific information:
     refcount bits: 16
     data file raw: false
     corrupt: false
+    extended l2: false
 
 === Clearing and setting data-file-raw ===
 
@@ -543,6 +546,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: true
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
@@ -555,6 +559,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 qemu-img: data-file-raw cannot be set on existing images
 image: TEST_DIR/t.IMGFMT
@@ -568,5 +573,6 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 5b21eb96bd..7d3a137434 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -95,17 +95,21 @@ class TestQCow3NotLazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
@@ -113,7 +117,8 @@ class TestQCow3NotLazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=off'
     qemu_options = 'lazy-refcounts=on'
     compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 
 class TestQCow3LazyQMP(TestQMP):
@@ -122,7 +127,8 @@ class TestQCow3LazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=on'
     qemu_options = 'lazy-refcounts=off'
     compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 TestImageInfoSpecific = None
 TestQemuImgInfo = None
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 9d4ed4dc9d..2a01e8bac2 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -3,14 +3,14 @@ QA output created by 082
 === create: Options specified more than once ===
 
 Testing: create -f foo -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
 cluster_size: 65536
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -20,9 +20,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -32,9 +33,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -59,6 +61,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -82,6 +85,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -105,6 +109,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -128,6 +133,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -151,6 +157,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -174,6 +181,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -197,6 +205,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -220,6 +229,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -227,10 +237,10 @@ Supported options:
   size=<size>            - Virtual disk size
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,? TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2, -o help TEST_DIR/t.qcow2 128M
 qemu-img: Invalid option list: backing_file=TEST_DIR/t.qcow2,
@@ -258,6 +268,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -279,7 +290,7 @@ qemu-img: Format driver 'bochs' does not support image creation
 === convert: Options specified more than once ===
 
 Testing: create -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: convert -f foo -f qcow2 TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -302,6 +313,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -313,6 +325,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -339,6 +352,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -362,6 +376,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -385,6 +400,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -408,6 +424,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -431,6 +448,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -454,6 +472,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -477,6 +496,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -500,6 +520,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -538,6 +559,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -582,6 +604,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=130M -o lazy_refcounts=off TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -593,6 +616,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=8M -o lazy_refcounts=on -o size=132M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -604,6 +628,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=4M,size=148M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -630,6 +655,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -654,6 +680,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -678,6 +705,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -702,6 +730,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -726,6 +755,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -750,6 +780,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -774,6 +805,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -798,6 +830,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -839,6 +872,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index bb50227b82..8a41314690 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -13,7 +13,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 === Create a single snapshot on virtio0 ===
 
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'snapshot-file':'TEST_DIR/1-snapshot-v0.IMGFMT', 'format': 'IMGFMT' } }
-Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Invalid command - missing device and nodename ===
@@ -30,40 +30,40 @@ Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file
 === Create several transactional group snapshots ===
 
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/2-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/2-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/3-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/3-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/4-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/4-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/5-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/5-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/6-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/6-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/7-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/7-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/8-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/8-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/9-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/9-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/10-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/10-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Create a couple of snapshots using blockdev-snapshot ===
diff --git a/tests/qemu-iotests/144.out b/tests/qemu-iotests/144.out
index c7aa2e4820..5d9aceaf13 100644
--- a/tests/qemu-iotests/144.out
+++ b/tests/qemu-iotests/144.out
@@ -9,7 +9,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'snapshot-file':'TEST_DIR/tmp.IMGFMT', 'format': 'IMGFMT' } }
-Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Performing block-commit on active layer ===
@@ -31,6 +31,6 @@ Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/
 === Performing Live Snapshot 2 ===
 
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'snapshot-file':'TEST_DIR/tmp2.IMGFMT', 'format': 'IMGFMT' } }
-Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 *** done
diff --git a/tests/qemu-iotests/182.out b/tests/qemu-iotests/182.out
index a8eea166c3..84dc7a2360 100644
--- a/tests/qemu-iotests/182.out
+++ b/tests/qemu-iotests/182.out
@@ -13,7 +13,7 @@ Is another process using the image [TEST_DIR/t.qcow2]?
 {'execute': 'blockdev-add', 'arguments': { 'node-name': 'node0', 'driver': 'file', 'filename': 'TEST_DIR/t.IMGFMT', 'locking': 'on' } }
 {"return": {}}
 {'execute': 'blockdev-snapshot-sync', 'arguments': { 'node-name': 'node0', 'snapshot-file': 'TEST_DIR/t.IMGFMT.overlay', 'snapshot-node-name': 'node1' } }
-Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 {'execute': 'blockdev-add', 'arguments': { 'node-name': 'node1', 'driver': 'file', 'filename': 'TEST_DIR/t.IMGFMT', 'locking': 'on' } }
 {"return": {}}
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index 8379ac5854..46b4268b30 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -9,14 +9,14 @@ Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
 === Creating backing chain ===
 
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'disk', 'snapshot-file': 'TEST_DIR/t.IMGFMT.mid', 'format': 'IMGFMT', 'mode': 'absolute-paths' } }
-Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'human-monitor-command', 'arguments': { 'command-line': 'qemu-io disk "write 0 4M"' } }
 wrote 4194304/4194304 bytes at offset 0
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'disk', 'snapshot-file': 'TEST_DIR/t.IMGFMT', 'format': 'IMGFMT', 'mode': 'absolute-paths' } }
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Start commit job and exit qemu ===
@@ -48,7 +48,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 { 'execute': 'drive-mirror', 'arguments': { 'device': 'disk', 'target': 'TEST_DIR/t.IMGFMT.copy', 'format': 'IMGFMT', 'sync': 'full', 'speed': 65536 } }
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
@@ -62,7 +62,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 l
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 { 'execute': 'drive-backup', 'arguments': { 'device': 'disk', 'target': 'TEST_DIR/t.IMGFMT.copy', 'format': 'IMGFMT', 'sync': 'full', 'speed': 65536 } }
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out
index e86b175e39..e46dccdb08 100644
--- a/tests/qemu-iotests/198.out
+++ b/tests/qemu-iotests/198.out
@@ -72,6 +72,7 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 
 == checking image layer ==
 image: json:{"encrypt.key-secret": "sec1", "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}}
@@ -115,4 +116,5 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 *** done
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index 61e7241e0b..d2efc0394a 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -21,6 +21,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (inline blockdev-add, explicit defaults) ===
 
@@ -43,6 +44,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v3 non-default options) ===
 
@@ -65,6 +67,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 1
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v2 non-default options) ===
 
@@ -141,6 +144,7 @@ Format specific information:
         payload offset: 528384
         master key iters: XXX
     corrupt: false
+    extended l2: false
 
 === Invalid BlockdevRef ===
 
diff --git a/tests/qemu-iotests/242.out b/tests/qemu-iotests/242.out
index 7ac8404d11..0d32dd9148 100644
--- a/tests/qemu-iotests/242.out
+++ b/tests/qemu-iotests/242.out
@@ -15,6 +15,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 No bitmap in JSON format output
 
@@ -40,6 +41,7 @@ Format specific information:
             granularity: 32768
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -77,6 +79,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -119,6 +122,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -162,5 +166,6 @@ Format specific information:
             granularity: 16384
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Test complete
diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
index 348909fdef..4e1b917a0f 100644
--- a/tests/qemu-iotests/255.out
+++ b/tests/qemu-iotests/255.out
@@ -3,9 +3,9 @@ Finishing a commit job with background reads
 
 === Create backing chain and start VM ===
 
-Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 === Start background read requests ===
 
@@ -23,9 +23,9 @@ Closing the VM while a job is being cancelled
 
 === Create images and start VM ===
 
-Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 wrote 1048576/1048576 bytes at offset 0
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/273.out b/tests/qemu-iotests/273.out
index c410fee5c4..b9ffe4dcd7 100644
--- a/tests/qemu-iotests/273.out
+++ b/tests/qemu-iotests/273.out
@@ -44,7 +44,8 @@ Testing: -blockdev file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
                             "compat": "1.1",
                             "lazy-refcounts": false,
                             "refcount-bits": 16,
-                            "corrupt": false
+                            "corrupt": false,
+                            "extended-l2": false
                         }
                     },
                     "full-backing-filename": "TEST_DIR/t.IMGFMT.base",
@@ -63,7 +64,8 @@ Testing: -blockdev file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
                         "compat": "1.1",
                         "lazy-refcounts": false,
                         "refcount-bits": 16,
-                        "corrupt": false
+                        "corrupt": false,
+                        "extended-l2": false
                     }
                 },
                 "full-backing-filename": "TEST_DIR/t.IMGFMT.mid",
@@ -142,7 +144,8 @@ Testing: -blockdev file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
                         "compat": "1.1",
                         "lazy-refcounts": false,
                         "refcount-bits": 16,
-                        "corrupt": false
+                        "corrupt": false,
+                        "extended-l2": false
                     }
                 },
                 "full-backing-filename": "TEST_DIR/t.IMGFMT.base",
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 5367deea39..ec42b9a62b 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -140,6 +140,7 @@ _filter_img_create()
         -e "s# adapter_type=[^ ]*##g" \
         -e "s# hwversion=[^ ]*##g" \
         -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+        -e "s# extended_l2=\\(on\\|off\\)##g" \
         -e "s# block_size=[0-9]\\+##g" \
         -e "s# block_state_zero=\\(on\\|off\\)##g" \
         -e "s# log_size=[0-9]\\+##g" \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure()
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (24 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 16:52   ` Max Reitz
  2019-12-22 11:37 ` [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
  2020-02-21 17:10 ` [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Max Reitz
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4f26953b1e..62093de1c6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3162,28 +3162,31 @@ int64_t qcow2_refcount_metadata_size(int64_t clusters, size_t cluster_size,
  * @total_size: virtual disk size in bytes
  * @cluster_size: cluster size in bytes
  * @refcount_order: refcount bits power-of-2 exponent
+ * @extended_l2: true if the image has extended L2 entries
  *
  * Returns: Total number of bytes required for the fully allocated image
  * (including metadata).
  */
 static int64_t qcow2_calc_prealloc_size(int64_t total_size,
                                         size_t cluster_size,
-                                        int refcount_order)
+                                        int refcount_order,
+                                        bool extended_l2)
 {
     int64_t meta_size = 0;
     uint64_t nl1e, nl2e;
     int64_t aligned_total_size = ROUND_UP(total_size, cluster_size);
+    size_t l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 
     /* header: 1 cluster */
     meta_size += cluster_size;
 
     /* total size of L2 tables */
     nl2e = aligned_total_size / cluster_size;
-    nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
-    meta_size += nl2e * sizeof(uint64_t);
+    nl2e = ROUND_UP(nl2e, cluster_size / l2e_size);
+    meta_size += nl2e * l2e_size;
 
     /* total size of L1 tables */
-    nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+    nl1e = nl2e * l2e_size / cluster_size;
     nl1e = ROUND_UP(nl1e, cluster_size / sizeof(uint64_t));
     meta_size += nl1e * sizeof(uint64_t);
 
@@ -4704,6 +4707,7 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     bool has_backing_file;
     bool has_luks;
     bool extended_l2;
+    size_t l2e_size;
 
     /* Parse image creation options */
     extended_l2 = qemu_opt_get_bool_del(opts, BLOCK_OPT_EXTL2, false);
@@ -4754,8 +4758,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     virtual_size = ROUND_UP(virtual_size, cluster_size);
 
     /* Check that virtual disk size is valid */
+    l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
     l2_tables = DIV_ROUND_UP(virtual_size / cluster_size,
-                             cluster_size / sizeof(uint64_t));
+                             cluster_size / l2e_size);
     if (l2_tables * sizeof(uint64_t) > QCOW_MAX_L1_SIZE) {
         error_setg(&local_err, "The image size is too large "
                                "(try using a larger cluster size)");
@@ -4818,9 +4823,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     }
 
     info = g_new(BlockMeasureInfo, 1);
-    info->fully_allocated =
+    info->fully_allocated = luks_payload_size +
         qcow2_calc_prealloc_size(virtual_size, cluster_size,
-                                 ctz32(refcount_bits)) + luks_payload_size;
+                                 ctz32(refcount_bits), extended_l2);
 
     /* Remove data clusters that are not required.  This overestimates the
      * required size because metadata needed for the fully allocated file is
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (25 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
@ 2019-12-22 11:37 ` Alberto Garcia
  2020-02-21 17:04   ` Max Reitz
  2020-02-21 17:10 ` [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Max Reitz
  27 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2019-12-22 11:37 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 tests/qemu-iotests/271     | 256 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/271.out | 208 ++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 465 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 0000000000..73cdc37bf0
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,256 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019 Igalia, S.L.
+# Author: Alberto Garcia <berto@igalia.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=berto@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+        rm -f "$TEST_IMG.raw"
+        rm -f "$TEST_IMG.backing"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+
+IMGOPTS="extended_l2=on"
+l2_offset=262144 # 0x40000
+
+_verify_img()
+{
+    $QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are identical'
+    $QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+        grep -v 'No errors were found on the image'
+}
+
+_read_l2_entry()
+{
+    entry_no=$1
+    nentries=$2
+    offset=$(($l2_offset + $entry_no * 16))
+    length=$((nentries * 16))
+    $QEMU_IO -f raw -c "read -v $offset $length" "$TEST_IMG" | _filter_qemu_io | head -n -2
+}
+
+_test_write()
+{
+    cmd="$1"
+    l2_entry_idx="$2"
+    l2_entry_num="$3"
+    raw_cmd=`echo $1 | sed s/-c//` # Raw images don't support -c
+    echo "$cmd"
+    $QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+    $QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+    _verify_img
+    if [ -n "$l2_entry_idx" ]; then
+        _read_l2_entry "$l2_entry_idx" "$l2_entry_num"
+    fi
+}
+
+_reset_img()
+{
+    $QEMU_IMG create -f raw "$TEST_IMG.raw" 1M | _filter_img_create
+    if [ "$use_backing_file" = "yes" ]; then
+        $QEMU_IMG create -f raw "$TEST_IMG.backing" 1M | _filter_img_create
+        $QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.backing" | _filter_qemu_io
+        $QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.raw" | _filter_qemu_io
+        _make_test_img -b "$TEST_IMG.backing" 1M
+    else
+        _make_test_img 1M
+    fi
+}
+
+# Test that writing to an image with subclusters produces the expected
+# results, in images with and without backing files
+for use_backing_file in yes no; do
+    echo
+    echo "### Standard write tests (backing file: $use_backing_file) ###"
+    echo
+    _reset_img
+    ### Write subcluster #0 (beginning of subcluster) ###
+    _test_write 'write -q -P 1 0 1k' 0 1
+
+    ### Write subcluster #1 (middle of subcluster) ###
+    _test_write 'write -q -P 2 3k 512' 0 1
+
+    ### Write subcluster #2 (end of subcluster) ###
+    _test_write 'write -q -P 3 5k 1k' 0 1
+
+    ### Write subcluster #3 (full subcluster) ###
+    _test_write 'write -q -P 4 6k 2k' 0 1
+
+    ### Write subclusters #4-6 (full subclusters) ###
+    _test_write 'write -q -P 5 8k 6k' 0 1
+
+    ### Write subclusters #7-9 (partial subclusters) ###
+    _test_write 'write -q -P 6 15k 4k' 0 1
+
+    ### Write subcluster #16 (partial subcluster) ###
+    _test_write 'write -q -P 7 32k 1k' 0 2
+
+    ### Write subcluster #31-#34 (cluster overlap) ###
+    _test_write 'write -q -P 8 63k 4k' 0 2
+
+    ### Zero subcluster #1 (TODO: use the "all zeros" bit)
+    _test_write 'write -q -z 2k 2k' 0 1
+
+    ### Zero cluster #0
+    _test_write 'write -q -z 0 64k' 0 1
+
+    ### Fill cluster #0 with data
+    _test_write 'write -q -P 9 0 64k' 0 1
+
+    ### Zero and unmap half of cluster #0 (this won't unmap it)
+    _test_write 'write -q -z -u 0 32k' 0 1
+
+    ### Zero and unmap cluster #0
+    _test_write 'write -q -z -u 0 64k' 0 1
+
+    ### Write subcluster #2 (middle of subcluster)
+    _test_write 'write -q -P 10 3k 512' 0 1
+
+    ### Fill cluster #0 with data
+    _test_write 'write -q -P 11 0 64k' 0 1
+
+    ### Discard cluster #0
+    _test_write 'discard -q 0 64k' 0 1
+
+    ### Write compressed data to cluster #0
+    _test_write 'write -q -c -P 12 0 64k' 0 1
+
+    ### Write subcluster #2 (middle of subcluster)
+    _test_write 'write -q -P 13 3k 512' 0 1
+done
+
+# Test that corrupted L2 entries are detected in both read and write
+# operations
+for corruption_test_cmd in read write; do
+    echo
+    echo "### Corrupted L2 entries - $corruption_test_cmd test (allocated) ###"
+    echo
+    echo "# 'cluster is zero' bit set on the standard cluster descriptor"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q 0 2k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+7)) "\x01"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# Both 'subcluster is zero' and 'subcluster is allocated' bits set"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q 0 2k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x80"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "### Corrupted L2 entries - $corruption_test_cmd test (unallocated) ###"
+    echo
+    echo "# 'cluster is zero' bit set on the standard cluster descriptor"
+    echo
+    _make_test_img 1M
+    # Write to cluster #4 in order to initialize the L2 table
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+7)) "\x01"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# 'subcluster is allocated' bit set"
+    echo
+    _make_test_img 1M
+    # Write to cluster #4 in order to initialize the L2 table
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+12)) "\x80"
+    _read_l2_entry 0 1
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# Both 'subcluster is zero' and 'subcluster is allocated' bits set"
+    echo
+    _make_test_img 1M
+    # Write to cluster #4 in order to initialize the L2 table
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x80\x00\x00\x00\x80"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "### Compressed cluster with subcluster bitmap != 0 - $corruption_test_cmd test ###"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q -c 0 64k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x01"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+done
+
+echo
+echo "### Image creation options ###"
+echo
+echo "# cluster_size < 16k"
+IMGOPTS="extended_l2=on,cluster_size=8k" _make_test_img 1M
+
+echo "# backing file and preallocation=metadata"
+IMGOPTS="extended_l2=on,preallocation=metadata" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo "# backing file and preallocation=falloc"
+IMGOPTS="extended_l2=on,preallocation=falloc" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo "# backing file and preallocation=full"
+IMGOPTS="extended_l2=on,preallocation=full" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo
+echo "### qemu-img measure ###"
+echo
+echo "# 512MB, extended_l2=off" # This needs one L2 table
+$QEMU_IMG measure --size 512M -O qcow2 -o extended_l2=off
+echo "# 512MB, extended_l2=on"  # This needs two L2 tables
+$QEMU_IMG measure --size 512M -O qcow2 -o extended_l2=on
+
+echo "# 16K clusters, 64GB, extended_l2=off" # This needs one L1 table
+$QEMU_IMG measure --size 64G -O qcow2 -o cluster_size=16k,extended_l2=off
+echo "# 16K clusters, 64GB, extended_l2=on"  # This needs two L2 tables
+$QEMU_IMG measure --size 64G -O qcow2 -o cluster_size=16k,extended_l2=on
+
+echo "# 8k clusters" # This should fail
+$QEMU_IMG measure --size 1M -O qcow2 -o cluster_size=8k,extended_l2=on
+
+echo "# 1024 TB" # Maximum allowed size with extended_l2=on and 64K clusters
+$QEMU_IMG measure --size 1024T -O qcow2 -o extended_l2=on
+echo "# 1025 TB" # This should fail
+$QEMU_IMG measure --size 1025T -O qcow2 -o extended_l2=on
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
+
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
new file mode 100644
index 0000000000..7e8cff5e14
--- /dev/null
+++ b/tests/qemu-iotests/271.out
@@ -0,0 +1,208 @@
+QA output created by 271
+
+### Standard write tests (backing file: yes) ###
+
+Formatting 'TEST_DIR/t.IMGFMT.raw', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT.backing', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing
+write -q -P 1 0 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 80 00 00 00  ................
+write -q -P 2 3k 512
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -P 3 5k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 e0 00 00 00  ................
+write -q -P 4 6k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 f0 00 00 00  ................
+write -q -P 5 8k 6k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 fe 00 00 00  ................
+write -q -P 6 15k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 00 00  ................
+write -q -P 7 32k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 00  ................
+00040010:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 8 63k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+00040010:  80 00 00 00 00 06 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -z 2k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+write -q -z 0 64k
+00040000:  80 00 00 00 00 05 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 9 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 32k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 10 3k 512
+00040000:  80 00 00 00 00 05 00 00 bf ff ff ff 40 00 00 00  ................
+write -q -P 11 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+discard -q 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -c -P 12 0 64k
+00040000:  40 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 13 3k 512
+00040000:  80 00 00 00 00 07 00 00 00 00 00 00 ff ff ff ff  ................
+
+### Standard write tests (backing file: no) ###
+
+Formatting 'TEST_DIR/t.IMGFMT.raw', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+write -q -P 1 0 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 80 00 00 00  ................
+write -q -P 2 3k 512
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -P 3 5k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 e0 00 00 00  ................
+write -q -P 4 6k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 f0 00 00 00  ................
+write -q -P 5 8k 6k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 fe 00 00 00  ................
+write -q -P 6 15k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 00 00  ................
+write -q -P 7 32k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 00  ................
+00040010:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 8 63k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+00040010:  80 00 00 00 00 06 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -z 2k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+write -q -z 0 64k
+00040000:  80 00 00 00 00 05 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 9 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 32k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 10 3k 512
+00040000:  80 00 00 00 00 05 00 00 bf ff ff ff 40 00 00 00  ................
+write -q -P 11 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+discard -q 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -c -P 12 0 64k
+00040000:  40 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 13 3k 512
+00040000:  80 00 00 00 00 07 00 00 00 00 00 00 ff ff ff ff  ................
+
+### Corrupted L2 entries - read test (allocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Corrupted L2 entries - read test (unallocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# 'subcluster is allocated' bit set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+00040000:  00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00  ................
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Compressed cluster with subcluster bitmap != 0 - read test ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Corrupted L2 entries - write test (allocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Corrupted L2 entries - write test (unallocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# 'subcluster is allocated' bit set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+00040000:  00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00  ................
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Compressed cluster with subcluster bitmap != 0 - write test ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Image creation options ###
+
+# cluster_size < 16k
+qemu-img: TEST_DIR/t.IMGFMT: Extended L2 entries are only supported with cluster sizes of at least 16384 bytes
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+# backing file and preallocation=metadata
+qemu-img: TEST_DIR/t.IMGFMT: Backing file and preallocation cannot be used at the same time
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=metadata
+# backing file and preallocation=falloc
+qemu-img: TEST_DIR/t.IMGFMT: Backing file and preallocation cannot be used at the same time
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=falloc
+# backing file and preallocation=full
+qemu-img: TEST_DIR/t.IMGFMT: Backing file and preallocation cannot be used at the same time
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=full
+
+### qemu-img measure ###
+
+# 512MB, extended_l2=off
+required size: 327680
+fully allocated size: 537198592
+# 512MB, extended_l2=on
+required size: 393216
+fully allocated size: 537264128
+# 16K clusters, 64GB, extended_l2=off
+required size: 42008576
+fully allocated size: 68761485312
+# 16K clusters, 64GB, extended_l2=on
+required size: 75579392
+fully allocated size: 68795056128
+# 8k clusters
+qemu-img: Extended L2 entries are only supported with cluster sizes of at least 16384 bytes
+# 1024 TB
+required size: 309285027840
+fully allocated size: 1126209191870464
+# 1025 TB
+qemu-img: The image size is too large (try using a larger cluster size)
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 6b10a6a762..3012f171a6 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -283,6 +283,7 @@
 267 rw auto quick snapshot
 268 rw auto quick
 270 rw backing quick
+271 rw auto
 272 rw
 273 backing quick
 277 rw quick
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta()
  2019-12-22 11:36 ` [RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta() Alberto Garcia
@ 2020-02-20 13:28   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 13:28 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 521 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> handle_alloc() creates a QCowL2Meta structure in order to update the
> image metadata and perform the necessary copy-on-write operations.
> 
> This patch moves that code to a separate function so it can be used
> from other places.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 77 +++++++++++++++++++++++++++++--------------
>  1 file changed, 53 insertions(+), 24 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 02/27] qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  2019-12-22 11:36 ` [RFC PATCH v3 02/27] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
@ 2020-02-20 13:32   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 13:32 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 389 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> We are going to need it in other places.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2-cluster.c | 34 +++++++++++++++++++---------------
>  1 file changed, 19 insertions(+), 15 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-12-22 11:37 ` [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2020-02-20 14:12   ` Eric Blake
  2020-02-20 14:16     ` Alberto Garcia
  2020-02-21 16:44   ` Max Reitz
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 14:12 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 12/22/19 5:37 AM, Alberto Garcia wrote:
> Now that the implementation of subclusters is complete we can finally
> add the necessary options to create and read images with this feature,
> which we call "extended L2 entries".
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---

Preliminary review on just interface items (I may do a deeper dive into 
the rest of the patch after getting through the series).

> +++ b/block/qcow2.c
> @@ -1383,6 +1383,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>       s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
>       s->subcluster_bits = ctz32(s->subcluster_size);
>   
> +    if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
> +        error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
>       /* Check support for various header values */
>       if (header.refcount_order > 6) {
>           error_setg(errp, "Reference count entry width too large; may not "
> @@ -2856,6 +2862,11 @@ int qcow2_update_header(BlockDriverState *bs)
>                   .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
>                   .name = "lazy refcounts",
>               },
> +            {
> +                .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
> +                .bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
> +                .name = "extended L2 entries",
> +            },

I'd sort this to be grouped with the other INCOMPATIBLE bits (after 
"external data file", rather than placing a COMPATIBLE bit in the middle.

Rebase conflict with my patches proposing the addition of an AUTOCLEAR 
bit, here and in the impacted iotests.  Should be trivial to resolve, by 
whoever lands second.

> +++ b/qapi/block-core.json
> @@ -66,6 +66,9 @@
>   #                 standalone (read-only) raw image without looking at qcow2
>   #                 metadata (since: 4.0)
>   #
> +# @extended-l2: true if the image has extended L2 entries; only valid for
> +#               compat >= 1.1 (since 4.2)
> +#

5.0, now.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2020-02-20 14:12   ` Eric Blake
@ 2020-02-20 14:16     ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-20 14:16 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 03:12:19 PM CET, Eric Blake wrote:
>> +            {
>> +                .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
>> +                .bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
>> +                .name = "extended L2 entries",
>> +            },
>
> I'd sort this to be grouped with the other INCOMPATIBLE bits (after
> "external data file", rather than placing a COMPATIBLE bit in the
> middle.

Ok I'll change that.

> Rebase conflict with my patches proposing the addition of an AUTOCLEAR
> bit, here and in the impacted iotests.  Should be trivial to resolve,
> by whoever lands second.

Sure, although since this is a trivial change this is not that important
at this point (RFC). But of course I'll make sure that the bit is the
correct one.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2019-12-22 11:36 ` [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature Alberto Garcia
@ 2020-02-20 14:28   ` Eric Blake
  2020-02-20 14:49     ` Alberto Garcia
  2020-02-20 14:33   ` Eric Blake
  2020-02-20 15:54   ` Max Reitz
  2 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 14:28 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 12/22/19 5:36 AM, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>   docs/qcow2-cache.txt   | 19 +++++++++++-
>   2 files changed, 83 insertions(+), 4 deletions(-)

This adds a new feature bit; where is the corresponding patch to qcow2.c 
to advertise the feature bit name in the optional feature name table?

/me reads ahead

good, patch 25 covers it.  Quick comment added there as a result.


> +== Extended L2 Entries ==
> +
> +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
> +field of the header.
> +
> +In these images standard data clusters are divided into 32 subclusters of the
> +same size. They are contiguous and start from the beginning of the cluster.
> +Subclusters can be allocated independently and the L2 entry contains information
> +indicating the status of each one of them. Compressed data clusters don't have
> +subclusters so they are treated like in images without this feature.

Grammar; I'd suggest:

...don't have subclusters, so they are treated the same as in images 
without this feature.

Are they truly the same, or do you still need to document that the extra 
64 bits of the extended L2 entry are all zero?

> +
> +The size of an extended L2 entry is 128 bits so the number of entries per table
> +is calculated using this formula:
> +
> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
> +
> +The first 64 bits have the same format as the standard L2 table entry described
> +in the previous section, with the exception of bit 0 of the standard cluster
> +descriptor.
> +
> +The last 64 bits contain a subcluster allocation bitmap with this format:
> +
> +Subcluster Allocation Bitmap (for standard clusters):
> +
> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
> +
> +                    1: the subcluster is allocated. In this case the
> +                       host cluster offset field must contain a valid
> +                       offset.
> +                    0: the subcluster is not allocated. In this case
> +                       read requests shall go to the backing file or
> +                       return zeros if there is no backing file data.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 31 - x)
> +

Missing trailing '.'

> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
> +
> +                    1: the subcluster reads as zeros. In this case the
> +                       allocation status bit must be unset. The host
> +                       cluster offset field may or may not be set.

Why must the allocation bit be unset?  When we preallocate, we want a 
cluster to reserve space, but still read as zero, so the combination of 
both bits set makes sense to me.

> +                    0: no effect.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 63 - x)

and again.

> +
> +Subcluster Allocation Bitmap (for compressed clusters):
> +
> +    Bit  0 -  63:   Reserved (set to 0)
> +                    Compressed clusters don't have subclusters,
> +                    so this field is not used.
>   
>   == Snapshots ==
>   
> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> index d57f409861..04eb4ce2f1 100644
> --- a/docs/qcow2-cache.txt
> +++ b/docs/qcow2-cache.txt
> @@ -1,6 +1,6 @@
>   qcow2 L2/refcount cache configuration
>   =====================================
> -Copyright (C) 2015, 2018 Igalia, S.L.
> +Copyright (C) 2015, 2018-2019 Igalia, S.L.

Our review is late; you could add 2020 if desired, now.

>   Author: Alberto Garcia <berto@igalia.com>
>   
>   This work is licensed under the terms of the GNU GPL, version 2 or
> @@ -222,3 +222,20 @@ support this functionality, and is 0 (disabled) on other platforms.
>   This functionality currently relies on the MADV_DONTNEED argument for
>   madvise() to actually free the memory. This is a Linux-specific feature,
>   so cache-clean-interval is not supported on other systems.
> +
> +
> +Extended L2 Entries
> +-------------------
> +All numbers shown in this document are valid for qcow2 images with normal
> +64-bit L2 entries.
> +
> +Images with extended L2 entries need twice as much L2 metadata, so the L2
> +cache size must be twice as large for the same disk space.
> +
> +   disk_size = l2_cache_size * cluster_size / 16
> +
> +i.e.
> +
> +   l2_cache_size = disk_size * 16 / cluster_size
> +
> +Refcount blocks are not affected by this.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2019-12-22 11:36 ` [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature Alberto Garcia
  2020-02-20 14:28   ` Eric Blake
@ 2020-02-20 14:33   ` Eric Blake
  2020-02-20 16:10     ` Alberto Garcia
  2020-02-20 15:54   ` Max Reitz
  2 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 14:33 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 12/22/19 5:36 AM, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---

> @@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change).
>   Given an offset into the virtual disk, the offset into the image file can be
>   obtained as follows:
>   
> -    l2_entries = (cluster_size / sizeof(uint64_t))
> +    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
>   
>       l2_index = (offset / cluster_size) % l2_entries
>       l1_index = (offset / cluster_size) / l2_entries
> @@ -447,6 +455,8 @@ obtained as follows:
>   
>       return cluster_offset + (offset % cluster_size)
>   
> +    [*] this changes if Extended L2 Entries are enabled, see next section

> +The size of an extended L2 entry is 128 bits so the number of entries per table
> +is calculated using this formula:
> +
> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))

Is it worth unifying these statements by writing:

l2_entries = (cluster_size / ((1 + extended_l2) * sizeof(uint64_t)))

or is that too confusing?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2020-02-20 14:28   ` Eric Blake
@ 2020-02-20 14:49     ` Alberto Garcia
  2020-02-20 15:16       ` Eric Blake
  0 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2020-02-20 14:49 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 03:28:17 PM CET, Eric Blake wrote:
>> +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
>> +field of the header.
>> +
>> +In these images standard data clusters are divided into 32 subclusters of the
>> +same size. They are contiguous and start from the beginning of the cluster.
>> +Subclusters can be allocated independently and the L2 entry contains information
>> +indicating the status of each one of them. Compressed data clusters don't have
>> +subclusters so they are treated like in images without this feature.
>
> Grammar; I'd suggest:
>
> ...don't have subclusters, so they are treated the same as in images 
> without this feature.

Ok

> Are they truly the same, or do you still need to document that the
> extra 64 bits of the extended L2 entry are all zero?

It is documented later in the same patch ("Subcluster Allocation Bitmap
for compressed clusters").

By the way, this series treats an L2 entry as invalid if any of those
bits is not zero, but I think I'll change that. Conceivably those bits
could be used for a future compatible feature, but it can only be
compatible if the previous versions ignore those bits.

>> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
>> +
>> +                    1: the subcluster reads as zeros. In this case the
>> +                       allocation status bit must be unset. The host
>> +                       cluster offset field may or may not be set.
>
> Why must the allocation bit be unset?  When we preallocate, we want a
> cluster to reserve space, but still read as zero, so the combination
> of both bits set makes sense to me.

Since 00 means unallocated and 01 allocated, there are two options left
to represent the "reads as zero" case: 10 and 11.

I think that one could argue for either one and there is no "right"
choice. I chose the former because I understood the allocation bit as
"the guest visible data is obtained from the raw data in that
subcluster" but the other option also makes sense.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-12-22 11:36 ` [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
@ 2020-02-20 14:53   ` Eric Blake
  2020-02-20 15:00   ` Max Reitz
  2020-02-20 15:19   ` Max Reitz
  2 siblings, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-02-20 14:53 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 12/22/19 5:36 AM, Alberto Garcia wrote:
> When writing to a qcow2 file there are two functions that take a
> virtual offset and return a host offset, possibly allocating new
> clusters if necessary:
> 
>     - handle_copied() looks for normal data clusters that are already
>       allocated and have a reference count of 1. In those clusters we
>       can simply write the data and there is no need to perform any
>       copy-on-write.
> 
>     - handle_alloc() looks for clusters that do need copy-on-write,
>       either because they haven't been allocated yet, because their
>       reference count is != 1 or because they are ZERO_ALLOC clusters.
> 
> The ZERO_ALLOC case is a bit special because those are clusters that
> are already allocated and they could perfectly be dealt with in
> handle_copied() (as long as copy-on-write is performed when required).
> 
> In fact, there is extra code specifically for them in handle_alloc()
> that tries to reuse the existing allocation if possible and frees them
> otherwise.
> 
> This patch changes the handling of ZERO_ALLOC clusters so the
> semantics of these two functions are now like this:
> 
>     - handle_copied() looks for clusters that are already allocated and
>       which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
>       reference count of 1).
> 
>     - handle_alloc() looks for clusters for which we need a new
>       allocation (all other cases).
> 
> One importante difference after this change is that clusters found in

important

> handle_copied() may now require copy-on-write, but this will be anyway
> necessary once we add support for subclusters.

necessary anyway

> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cluster.c | 226 +++++++++++++++++++++++-------------------
>   1 file changed, 126 insertions(+), 100 deletions(-)

A bit of an increase in code size, but I think it does reduce the 
overall complexity to treat ZERO_ALLOC like normal.  The patch is big, 
but I don't see any sane way to split it.  Overall, I like it.

> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e078bddcc2..9387f15866 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c

> @@ -1069,18 +1112,20 @@ static void calculate_l2_meta(BlockDriverState *bs,
>       QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>   }
>   
> -/* Returns true if writing to a cluster requires COW */
> -static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
> +/* Returns true if writing to the cluster pointed to by @l2_entry
> + * requires a new allocation (that is, if the cluster is unallocated
> + * or has refcount > 1 and therefore cannot be written in-place). */

syntax check wants you to wing this comment, now.

> +static bool cluster_needs_new_alloc(BlockDriverState *bs, uint64_t l2_entry)

The rename makes sense.

> @@ -1337,9 +1400,10 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
>   }
>   
>   /*
> - * Allocates new clusters for an area that either is yet unallocated or needs a
> - * copy on write. If *host_offset is not INV_OFFSET, clusters are only
> - * allocated if the new allocation can match the specified host offset.
> + * Allocates new clusters for an area that either is yet unallocated or
> + * cannot be overwritten in-place. If *host_offset is not INV_OFFSET,

s/either is yet/is either still/

> + * clusters are only allocated if the new allocation can match the specified
> + * host offset.
>    *
>    * Note that guest_offset may not be cluster aligned. In this case, the
>    * returned *host_offset points to exact byte referenced by guest_offset and

Findings are minor and you can fix them up when dropping RFC.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-12-22 11:36 ` [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
  2020-02-20 14:53   ` Eric Blake
@ 2020-02-20 15:00   ` Max Reitz
  2020-02-20 15:19   ` Max Reitz
  2 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 15:00 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1897 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> When writing to a qcow2 file there are two functions that take a
> virtual offset and return a host offset, possibly allocating new
> clusters if necessary:
> 
>    - handle_copied() looks for normal data clusters that are already
>      allocated and have a reference count of 1. In those clusters we
>      can simply write the data and there is no need to perform any
>      copy-on-write.
> 
>    - handle_alloc() looks for clusters that do need copy-on-write,
>      either because they haven't been allocated yet, because their
>      reference count is != 1 or because they are ZERO_ALLOC clusters.
> 
> The ZERO_ALLOC case is a bit special because those are clusters that
> are already allocated and they could perfectly be dealt with in
> handle_copied() (as long as copy-on-write is performed when required).
> 
> In fact, there is extra code specifically for them in handle_alloc()
> that tries to reuse the existing allocation if possible and frees them
> otherwise.
> 
> This patch changes the handling of ZERO_ALLOC clusters so the
> semantics of these two functions are now like this:
> 
>    - handle_copied() looks for clusters that are already allocated and
>      which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
>      reference count of 1).
> 
>    - handle_alloc() looks for clusters for which we need a new
>      allocation (all other cases).
> 
> One importante difference after this change is that clusters found in
> handle_copied() may now require copy-on-write, but this will be anyway
> necessary once we add support for subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 226 +++++++++++++++++++++++-------------------
>  1 file changed, 126 insertions(+), 100 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2020-02-20 14:49     ` Alberto Garcia
@ 2020-02-20 15:16       ` Eric Blake
  2020-02-26 16:57         ` Alberto Garcia
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 15:16 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 2/20/20 8:49 AM, Alberto Garcia wrote:
> On Thu 20 Feb 2020 03:28:17 PM CET, Eric Blake wrote:
>>> +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
>>> +field of the header.
>>> +
>>> +In these images standard data clusters are divided into 32 subclusters of the
>>> +same size. They are contiguous and start from the beginning of the cluster.
>>> +Subclusters can be allocated independently and the L2 entry contains information
>>> +indicating the status of each one of them. Compressed data clusters don't have
>>> +subclusters so they are treated like in images without this feature.
>>
>> Grammar; I'd suggest:
>>
>> ...don't have subclusters, so they are treated the same as in images
>> without this feature.
> 
> Ok
> 
>> Are they truly the same, or do you still need to document that the
>> extra 64 bits of the extended L2 entry are all zero?
> 
> It is documented later in the same patch ("Subcluster Allocation Bitmap
> for compressed clusters").

Yes, I saw the mention later.  I'm just wondering if we need to 
rearrange text to mention that the bits are reserved (set to 0, ignore 
on read) closer to the point where we document compressed clusters have 
no subclusters.

> 
> By the way, this series treats an L2 entry as invalid if any of those
> bits is not zero, but I think I'll change that. Conceivably those bits
> could be used for a future compatible feature, but it can only be
> compatible if the previous versions ignore those bits.
> 
>>> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
>>> +
>>> +                    1: the subcluster reads as zeros. In this case the
>>> +                       allocation status bit must be unset. The host
>>> +                       cluster offset field may or may not be set.
>>
>> Why must the allocation bit be unset?  When we preallocate, we want a
>> cluster to reserve space, but still read as zero, so the combination
>> of both bits set makes sense to me.
> 
> Since 00 means unallocated and 01 allocated, there are two options left
> to represent the "reads as zero" case: 10 and 11.
> 
> I think that one could argue for either one and there is no "right"
> choice. I chose the former because I understood the allocation bit as
> "the guest visible data is obtained from the raw data in that
> subcluster" but the other option also makes sense.

My argument is that BOTH bit settings make sense:

10 - reads as zero, but subcluster is not allocated
11 - reads as zero, and subcluster is allocated

Oh, I see.  I'm getting confused on the meanings of "allocated". 
Meaning 1: a host address is reserved for the guest address 
(pre-allocation sense).  Meaning 2: guest reads come from this layer 
rather than from the backing layer (COW/COR sense).

Pre-allocation is ALWAYS done a cluster at a time (you only have ONE 
host offset, shared among all 32 subclusters, per L2 entry), so either 
all 32 subclusters have a preallocated location, or none of them do. 
What is left, then, is a determination of whether to read locally or 
from the backing file, AND when reading locally, whether to read from 
the pre-allocated space or to just read zeroes.

We have 8 potential combinations (not all make sense):

host   zero alloc
   0      0    0     cluster unallocated, subcluster defers to backing
   0      0    1     error (except maybe for external data file)
   0      1    0     cluster unallocated, subcluster reads as zero
   0      1    1     error (except maybe for external data file)
  addr    0    0     cluster allocated, subcluster defers to backing
  addr    0    1     cluster allocated, subcluster reads from host
  addr    1    0     cluster allocated, subcluster reads as zero
  addr    1    1   error, or cluster allocated, subcluster reads as zero

Hmm - normally addr is non-zero (because the 0 addr is the metadata 
cluster of qcow2), but with external data file, host addr 0 is required 
for guest offset 0.  How do subclusters play with external data files? 
It makes sense to still have subclusters read as 0 or defer to backing 
with an external file (except maybe when raw external file is set).  But 
you did word it as if the alloc bit is set, the "host cluster offset 
field must contain a valid offset" which includes an offset of 0 for 
external data file.

If we mandate 10 for the reads-as-zero form, then whether addr is valid 
is irrelevant. If we mandate 11 for the reads-as-zero form, then addr 
must be valid even though we don't reference addr.  Having written all 
that, I agree that either form should work, but also that mandating one 
form leaves the door open for a future extension to define meaning to 
the form we did not permit (that is, either 10 or 11 becomes a reserved 
pattern that we can later give meaning to), vs. allowing both forms now 
and locking ourselves out of a future meaning.  And mandating addr to be 
valid even when reading zeroes doesn't use addr feels odd.

So, I'm okay with your choice of picking 00, 01, and 10 as the mandated 
forms, and declaring 11 as invalid for now (but a possible future 
extension).  Maybe I'll change my mind when seeing what complexity it 
adds to the qcow2 reference implementation, but hopefully not.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-12-22 11:36 ` [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
  2020-02-20 14:53   ` Eric Blake
  2020-02-20 15:00   ` Max Reitz
@ 2020-02-20 15:19   ` Max Reitz
  2 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 15:19 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 4618 bytes --]

I have no idea how I managed to forgot some style comments I wrote
during the review, but, anyway:

On 22.12.19 12:36, Alberto Garcia wrote:
> When writing to a qcow2 file there are two functions that take a
> virtual offset and return a host offset, possibly allocating new
> clusters if necessary:
> 
>    - handle_copied() looks for normal data clusters that are already
>      allocated and have a reference count of 1. In those clusters we
>      can simply write the data and there is no need to perform any
>      copy-on-write.
> 
>    - handle_alloc() looks for clusters that do need copy-on-write,
>      either because they haven't been allocated yet, because their
>      reference count is != 1 or because they are ZERO_ALLOC clusters.
> 
> The ZERO_ALLOC case is a bit special because those are clusters that
> are already allocated and they could perfectly be dealt with in
> handle_copied() (as long as copy-on-write is performed when required).
> 
> In fact, there is extra code specifically for them in handle_alloc()
> that tries to reuse the existing allocation if possible and frees them
> otherwise.
> 
> This patch changes the handling of ZERO_ALLOC clusters so the
> semantics of these two functions are now like this:
> 
>    - handle_copied() looks for clusters that are already allocated and
>      which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
>      reference count of 1).
> 
>    - handle_alloc() looks for clusters for which we need a new
>      allocation (all other cases).
> 
> One importante difference after this change is that clusters found in

s/importante/important/

> handle_copied() may now require copy-on-write, but this will be anyway
> necessary once we add support for subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 226 +++++++++++++++++++++++-------------------
>  1 file changed, 126 insertions(+), 100 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e078bddcc2..9387f15866 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c

[...]

> @@ -1035,15 +1040,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>  static void calculate_l2_meta(BlockDriverState *bs,
>                                uint64_t host_cluster_offset,
>                                uint64_t guest_offset, unsigned bytes,
> -                              QCowL2Meta **m, bool keep_old)
> +                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
>  {

[...]

> +    /* Return if there's no COW (all clusters are normal and we keep them) */
> +    if (keep_old) {
> +        int i;
> +        for (i = 0; i < nb_clusters; i++) {
> +            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
> +            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
> +                break;
> +            }
> +        }
> +        if (i == nb_clusters) {
> +            return;
> +        }
> +    }
> +
> +    /* Get the L2 entry from the first cluster */

s/from/of/

(Otherwise it sounds a bit like this is the same entry for all clusters)

> +    l2_entry = be64_to_cpu(l2_slice[l2_index]);
> +    type = qcow2_get_cluster_type(bs, l2_entry);
> +
> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> +        cow_start_from = cow_start_to;
> +    } else {
> +        cow_start_from = 0;
> +    }
> +
> +    /* Get the L2 entry from the last cluster */

s/from/of/

> +    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
> +    type = qcow2_get_cluster_type(bs, l2_entry);
> +
> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> +        cow_end_to = cow_end_from;
> +    } else {
> +        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
> +    }
>  
>      *m = g_malloc0(sizeof(**m));
>      **m = (QCowL2Meta) {
> @@ -1069,18 +1112,20 @@ static void calculate_l2_meta(BlockDriverState *bs,
>      QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>  }
>  
> -/* Returns true if writing to a cluster requires COW */
> -static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
> +/* Returns true if writing to the cluster pointed to by @l2_entry
> + * requires a new allocation (that is, if the cluster is unallocated
> + * or has refcount > 1 and therefore cannot be written in-place). */

Not sure why Patchew hasn’t complained, but the current coding style
requires /* and */ to be on separate lines for multi-line comments.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry()
  2019-12-22 11:36 ` [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
@ 2020-02-20 15:22   ` Eric Blake
  2020-02-20 16:08     ` Alberto Garcia
  2020-02-20 15:39   ` Max Reitz
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 15:22 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 12/22/19 5:36 AM, Alberto Garcia wrote:
> The size of an L2 entry is 64 bits, but if we want to have subclusters
> we need extended L2 entries. This means that we have to access L2
> tables and slices differently depending on whether an image has
> extended L2 entries or not.
> 
> This patch replaces all l2_slice[] accesses with calls to
> get_l2_entry() and set_l2_entry().
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cluster.c  | 65 ++++++++++++++++++++++--------------------
>   block/qcow2-refcount.c | 17 +++++------
>   block/qcow2.h          | 12 ++++++++
>   3 files changed, 55 insertions(+), 39 deletions(-)
> 

> @@ -978,12 +981,12 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>            * cluster the second one has to do RMW (which is done above by
>            * perform_cow()), update l2 table with its cluster pointer and free
>            * old cluster. This is what this loop does */
> -        if (l2_slice[l2_index + i] != 0) {
> -            old_cluster[j++] = l2_slice[l2_index + i];
> +        if (get_l2_entry(s, l2_slice, l2_index + i) != 0) {
> +            old_cluster[j++] = get_l2_entry(s, l2_slice, l2_index + i);
>           }
>   
> -        l2_slice[l2_index + i] = cpu_to_be64((cluster_offset +
> -                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
> +        set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
> +                     (cluster_offset + (i << s->cluster_bits)));

Cute commutative law use for line length reasons.

> +++ b/block/qcow2.h

scripts/git.orderfile can be used to hoist this part of the patch to the 
front of the message (as it is more valuable to review first).

> @@ -495,6 +495,18 @@ typedef enum QCow2MetadataOverlap {
>   
>   #define INV_OFFSET (-1ULL)
>   
> +static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
> +                                    int idx)
> +{
> +    return be64_to_cpu(l2_slice[idx]);
> +}
> +
> +static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
> +                                int idx, uint64_t entry)
> +{
> +    l2_slice[idx] = cpu_to_be64(entry);
> +}
> +

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function
  2019-12-22 11:36 ` [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function Alberto Garcia
@ 2020-02-20 15:24   ` Eric Blake
  2020-02-20 16:03   ` Max Reitz
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Blake @ 2020-02-20 15:24 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 12/22/19 5:36 AM, Alberto Garcia wrote:
> This function will be used by the qcow2 code to check if an image has
> subclusters or not.
> 
> At the moment this simply returns false. Once all patches needed for
> subcluster support are ready then QEMU will be able to create and
> read images with subclusters and this function will return the actual
> value.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2.h | 6 ++++++
>   1 file changed, 6 insertions(+)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 6823d3f68f..1db3fc5dbc 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -495,6 +495,12 @@ typedef enum QCow2MetadataOverlap {
>   
>   #define INV_OFFSET (-1ULL)
>   
> +static inline bool has_subclusters(BDRVQcow2State *s)
> +{
> +    /* FIXME: Return false until this feature is complete */
> +    return false;
> +}
> +
>   static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
>                                       int idx)
>   {
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State
  2019-12-22 11:36 ` [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
@ 2020-02-20 15:28   ` Eric Blake
  2020-02-20 16:34     ` Alberto Garcia
  2020-02-20 16:15   ` Max Reitz
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 15:28 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 12/22/19 5:36 AM, Alberto Garcia wrote:
> This patch adds the following new fields to BDRVQcow2State:
> 
> - subclusters_per_cluster: Number of subclusters in a cluster
> - subcluster_size: The size of each subcluster, in bytes
> - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size
> 
> Images without subclusters are treated as if they had exactly one,
> with subcluster_size = cluster_size.

The qcow2 spec changes earlier in the series made it sound like your 
choices are exactly 1 or 32,

> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2.c | 5 +++++
>   block/qcow2.h | 5 +++++
>   2 files changed, 10 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 3866b47946..cbd857e9c7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1378,6 +1378,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>           }
>       }
>   
> +    s->subclusters_per_cluster =
> +        has_subclusters(s) ? QCOW_MAX_SUBCLUSTERS_PER_CLUSTER : 1;

which matches your code here (other than the name of the constant)...

> +    s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
> +    s->subcluster_bits = ctz32(s->subcluster_size);
> +
>       /* Check support for various header values */
>       if (header.refcount_order > 6) {
>           error_setg(errp, "Reference count entry width too large; may not "
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 1db3fc5dbc..941330cfc9 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -78,6 +78,8 @@
>   /* The cluster reads as all zeros */
>   #define QCOW_OFLAG_ZERO (1ULL << 0)
>   
> +#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
> +

...but this name sounds like other values (2, 4, 8, 16) might be 
possible?  Is this just leftovers from earlier spins of the series 
before we decided to mandate that clusters must be at least 16k if 
subclusters are enabled (so that subclusters are at least 512 bytes)?

Once we get the right name for the constant, the rest of the patch makes 
sense.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry()
  2019-12-22 11:36 ` [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
  2020-02-20 15:22   ` Eric Blake
@ 2020-02-20 15:39   ` Max Reitz
  1 sibling, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 15:39 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 712 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> The size of an L2 entry is 64 bits, but if we want to have subclusters
> we need extended L2 entries. This means that we have to access L2
> tables and slices differently depending on whether an image has
> extended L2 entries or not.
> 
> This patch replaces all l2_slice[] accesses with calls to
> get_l2_entry() and set_l2_entry().
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c  | 65 ++++++++++++++++++++++--------------------
>  block/qcow2-refcount.c | 17 +++++------
>  block/qcow2.h          | 12 ++++++++
>  3 files changed, 55 insertions(+), 39 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2019-12-22 11:36 ` [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature Alberto Garcia
  2020-02-20 14:28   ` Eric Blake
  2020-02-20 14:33   ` Eric Blake
@ 2020-02-20 15:54   ` Max Reitz
  2020-02-20 16:02     ` Eric Blake
  2 siblings, 1 reply; 80+ messages in thread
From: Max Reitz @ 2020-02-20 15:54 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 3344 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>  docs/qcow2-cache.txt   | 19 +++++++++++-
>  2 files changed, 83 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index af5711e533..d34261f955 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
>                      as the maximum cluster size and won't be able to open images
>                      with larger cluster sizes.
>  
> +                    Note: if the image has Extended L2 Entries then cluster_bits
> +                    must be at least 14 (i.e. 16384 byte clusters).
> +
>           24 - 31:   size
>                      Virtual disk size in bytes.
>  
> @@ -109,7 +112,12 @@ in the description of a field.
>                                  An External Data File Name header extension may
>                                  be present if this bit is set.
>  
> -                    Bits 3-63:  Reserved (set to 0)
> +                    Bit 3:      Extended L2 Entries.  If this bit is set then

I suppose bit 4 now.  (Compression is bit 3.)

[...]

> +Subcluster Allocation Bitmap (for standard clusters):
> +
> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
> +
> +                    1: the subcluster is allocated. In this case the
> +                       host cluster offset field must contain a valid
> +                       offset.
> +                    0: the subcluster is not allocated. In this case
> +                       read requests shall go to the backing file or
> +                       return zeros if there is no backing file data.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 31 - x)

I still prefer it the other way round, both personally (e.g. it’s the C
ordering), and because other places in qcow2 use LSb for bit ordering
(the refcount order).

I don’t see ease of debugging as a particularly good reason; but then
again, I didn’t have to debug this feature yet (as opposed to you).

But since I’m used to counting bits from the right (because this is how
it’s done basically everywhere), I can’t imagine I would find it more
difficult than counting them from the left.

Max

> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
> +
> +                    1: the subcluster reads as zeros. In this case the
> +                       allocation status bit must be unset. The host
> +                       cluster offset field may or may not be set.
> +                    0: no effect.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 63 - x)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2020-02-20 15:54   ` Max Reitz
@ 2020-02-20 16:02     ` Eric Blake
  2020-02-20 16:04       ` Alberto Garcia
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 16:02 UTC (permalink / raw)
  To: Max Reitz, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov,
	Vladimir Sementsov-Ogievskiy, qemu-block

On 2/20/20 9:54 AM, Max Reitz wrote:

>> +Subcluster Allocation Bitmap (for standard clusters):
>> +
>> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
>> +
>> +                    1: the subcluster is allocated. In this case the
>> +                       host cluster offset field must contain a valid
>> +                       offset.
>> +                    0: the subcluster is not allocated. In this case
>> +                       read requests shall go to the backing file or
>> +                       return zeros if there is no backing file data.
>> +
>> +                    Bits are assigned starting from the most significant one.
>> +                    (i.e. bit x is used for subcluster 31 - x)
> 
> I still prefer it the other way round, both personally (e.g. it’s the C
> ordering), and because other places in qcow2 use LSb for bit ordering
> (the refcount order).

Internal consistency with refcount order using LSb ordering is the 
strongest reason to flip things, and have bit x be subcluster x.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function
  2019-12-22 11:36 ` [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function Alberto Garcia
  2020-02-20 15:24   ` Eric Blake
@ 2020-02-20 16:03   ` Max Reitz
  1 sibling, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 16:03 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 542 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> This function will be used by the qcow2 code to check if an image has
> subclusters or not.
> 
> At the moment this simply returns false. Once all patches needed for
> subcluster support are ready then QEMU will be able to create and
> read images with subclusters and this function will return the actual
> value.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 6 ++++++
>  1 file changed, 6 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2020-02-20 16:02     ` Eric Blake
@ 2020-02-20 16:04       ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-20 16:04 UTC (permalink / raw)
  To: Eric Blake, Max Reitz, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov,
	Vladimir Sementsov-Ogievskiy, qemu-block

On Thu 20 Feb 2020 05:02:22 PM CET, Eric Blake wrote:
>>> +                    Bits are assigned starting from the most significant one.
>>> +                    (i.e. bit x is used for subcluster 31 - x)
>> 
>> I still prefer it the other way round, both personally (e.g. it’s the
>> C ordering), and because other places in qcow2 use LSb for bit
>> ordering (the refcount order).
>
> Internal consistency with refcount order using LSb ordering is the
> strongest reason to flip things, and have bit x be subcluster x.

Ok, I think you're both right, I'll change that.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry()
  2020-02-20 15:22   ` Eric Blake
@ 2020-02-20 16:08     ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-20 16:08 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 04:22:24 PM CET, Eric Blake wrote:
>> +++ b/block/qcow2.h
>
> scripts/git.orderfile can be used to hoist this part of the patch to
> the front of the message (as it is more valuable to review first).

I didn't know that git had this feature, thanks for the tip!

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2020-02-20 14:33   ` Eric Blake
@ 2020-02-20 16:10     ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-20 16:10 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 03:33:57 PM CET, Eric Blake wrote:
>>   Given an offset into the virtual disk, the offset into the image file can be
>>   obtained as follows:
>>   
>> -    l2_entries = (cluster_size / sizeof(uint64_t))
>> +    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
>>   
>>       l2_index = (offset / cluster_size) % l2_entries
>>       l1_index = (offset / cluster_size) / l2_entries
>> @@ -447,6 +455,8 @@ obtained as follows:
>>   
>>       return cluster_offset + (offset % cluster_size)
>>   
>> +    [*] this changes if Extended L2 Entries are enabled, see next section
>
>> +The size of an extended L2 entry is 128 bits so the number of entries per table
>> +is calculated using this formula:
>> +
>> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
>
> Is it worth unifying these statements by writing:
>
> l2_entries = (cluster_size / ((1 + extended_l2) * sizeof(uint64_t)))
>
> or is that too confusing?

I think it's too confusing...

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State
  2019-12-22 11:36 ` [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
  2020-02-20 15:28   ` Eric Blake
@ 2020-02-20 16:15   ` Max Reitz
  1 sibling, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 16:15 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 629 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> This patch adds the following new fields to BDRVQcow2State:
> 
> - subclusters_per_cluster: Number of subclusters in a cluster
> - subcluster_size: The size of each subcluster, in bytes
> - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size
> 
> Images without subclusters are treated as if they had exactly one,
> with subcluster_size = cluster_size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 5 +++++
>  block/qcow2.h | 5 +++++
>  2 files changed, 10 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 08/27] qcow2: Add offset_to_sc_index()
  2019-12-22 11:36 ` [RFC PATCH v3 08/27] qcow2: Add offset_to_sc_index() Alberto Garcia
@ 2020-02-20 16:19   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 16:19 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 371 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> For a given offset, return the subcluster number within its cluster
> (i.e. with 32 subclusters per cluster it returns a number between 0
> and 31).
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 5 +++++
>  1 file changed, 5 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 09/27] qcow2: Add l2_entry_size()
  2019-12-22 11:36 ` [RFC PATCH v3 09/27] qcow2: Add l2_entry_size() Alberto Garcia
@ 2020-02-20 16:24   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 16:24 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 825 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> qcow2 images with subclusters have 128-bit L2 entries. The first 64
> bits contain the same information as traditional images and the last
> 64 bits form a bitmap with the status of each individual subcluster.
> 
> Because of that we cannot assume that L2 entries are sizeof(uint64_t)
> anymore. This function returns the proper value for the image.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c  | 12 ++++++------
>  block/qcow2-refcount.c | 14 ++++++++------
>  block/qcow2.c          |  8 ++++----
>  block/qcow2.h          |  9 +++++++++
>  4 files changed, 27 insertions(+), 16 deletions(-)

Assuming qcow2_calc_prealloc_size() and qcow2_measure are fixed up in
patch 26:

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2019-12-22 11:36 ` [RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
@ 2020-02-20 16:27   ` Max Reitz
  2020-02-21 13:57     ` Alberto Garcia
  0 siblings, 1 reply; 80+ messages in thread
From: Max Reitz @ 2020-02-20 16:27 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1537 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
> 64 bits for the subcluster allocation bitmap.
> 
> In order to support them correctly get/set_l2_entry() need to be
> updated so they take the entry width into account in order to
> calculate the correct offset.
> 
> This patch also adds the get/set_l2_bitmap() functions that are
> used to access the bitmaps. For convenience we allow calling
> get_l2_bitmap() on images without subclusters, although the caller
> does not need and should ignore the returned value.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 8be020bb76..64b0a814f4 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -518,15 +518,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)

[...]

> +static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
> +                                     int idx)
> +{
> +    if (has_subclusters(s)) {
> +        idx *= l2_entry_size(s) / sizeof(uint64_t);
> +        return be64_to_cpu(l2_slice[idx + 1]);
> +    } else {
> +        /* For convenience only; the caller should ignore this value. */
> +        return 0;

Is there a reason you decided not to return the first subcluster as
allocated?  (As you had proposed in v2)

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +    }
> +}


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State
  2020-02-20 15:28   ` Eric Blake
@ 2020-02-20 16:34     ` Alberto Garcia
  2020-02-20 16:48       ` Eric Blake
  0 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2020-02-20 16:34 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 04:28:07 PM CET, Eric Blake wrote:
>> Images without subclusters are treated as if they had exactly one,
>> with subcluster_size = cluster_size.
>
> The qcow2 spec changes earlier in the series made it sound like your
> choices are exactly 1 or 32,

>> +#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
>> +
>
> ...but this name sounds like other values (2, 4, 8, 16) might be
> possible?

I guess I didn't want to call it QCOW_SUBCLUSTERS_PER_CLUSTER because
there's already BDRVQcow2State.subclusters_per_cluster. And that one can
have two possible values (1 and 32) so 32 would be the maximum.

I get your point, however, and I'm open to suggestions.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State
  2020-02-20 16:34     ` Alberto Garcia
@ 2020-02-20 16:48       ` Eric Blake
  2020-02-21 13:14         ` Alberto Garcia
  0 siblings, 1 reply; 80+ messages in thread
From: Eric Blake @ 2020-02-20 16:48 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 2/20/20 10:34 AM, Alberto Garcia wrote:
> On Thu 20 Feb 2020 04:28:07 PM CET, Eric Blake wrote:
>>> Images without subclusters are treated as if they had exactly one,
>>> with subcluster_size = cluster_size.
>>
>> The qcow2 spec changes earlier in the series made it sound like your
>> choices are exactly 1 or 32,
> 
>>> +#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
>>> +
>>
>> ...but this name sounds like other values (2, 4, 8, 16) might be
>> possible?
> 
> I guess I didn't want to call it QCOW_SUBCLUSTERS_PER_CLUSTER because
> there's already BDRVQcow2State.subclusters_per_cluster. And that one can
> have two possible values (1 and 32) so 32 would be the maximum.
> 
> I get your point, however, and I'm open to suggestions.

Maybe QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER

since it is a hard-coded property of the EXTL2 feature.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 11/27] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  2019-12-22 11:36 ` [RFC PATCH v3 11/27] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
@ 2020-02-20 17:21   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-20 17:21 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 2322 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> This patch adds QCow2SubclusterType, which is the subcluster-level
> version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
> the same meaning as their QCOW2_CLUSTER_* equivalents (when they
> exist). See below for details and caveats.
> 
> In images without extended L2 entries clusters are treated as having
> exactly one subcluster so it is possible to replace one data type with
> the other while keeping the exact same semantics.
> 
> With extended L2 entries there are new possible values, and every
> subcluster in the same cluster can obviously have a different
> QCow2SubclusterType so functions need to be adapted to work on the
> subcluster level.
> 
> There are several things that have to be taken into account:
> 
>   a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
>      compressed. We do not support compression at the subcluster
>      level.
> 
>   b) There are two different values for unallocated subclusters:
>      QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
>      cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
>      which means that the cluster is allocated but the subcluster is
>      not. The latter can only happen in images with extended L2
>      entries.
> 
>   c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
>      entry has a value that violates the specification. The caller is
>      responsible for handling these situations.
> 
>      To prevent compatibility problems with images that have invalid
>      values but are currently being read by QEMU without causing side
>      effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
>      with extended L2 entries.
> 
> qcow2_cluster_to_subcluster_type() is added as a separate function
> from qcow2_get_subcluster_type(), but this is only temporary and both
> will be merged in a subsequent patch.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 92 insertions(+)

With the comment style fixed as now required by the coding style (/* and
*/ on separate lines), and regardless of the bit ordering:

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  2019-12-22 11:36 ` [RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
@ 2020-02-21 11:35   ` Max Reitz
  2020-02-21 15:14     ` Alberto Garcia
  0 siblings, 1 reply; 80+ messages in thread
From: Max Reitz @ 2020-02-21 11:35 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1582 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> In order to support extended L2 entries some functions of the qcow2
> driver need to start dealing with subclusters instead of clusters.
> 
> qcow2_get_cluster_offset() is modified to return the subcluster
> type instead of the cluster type, and all callers are updated to
> replace all values of QCow2ClusterType with their QCow2SubclusterType
> equivalents (as returned by qcow2_cluster_to_subcluster_type()).
> 
> This patch only changes the data types, there are no semantic changes.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 19 +++++-----
>  block/qcow2.c         | 82 +++++++++++++++++++++++++------------------
>  block/qcow2.h         |  3 +-
>  3 files changed, 60 insertions(+), 44 deletions(-)

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index e7607d90d4..9277d680ef 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2223,22 +2227,23 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
>          }
>  
>          qemu_co_mutex_lock(&s->lock);
> -        ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
> +        ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes,
> +                                       &cluster_offset, &type);

I wonder whether this is kind of a bug fix here.  It’s entirely possible
that @ret isn’t set after this, and then we get to the “out” label,
which has a check on “if (ret == 0)”.

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 13/27] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  2019-12-22 11:36 ` [RFC PATCH v3 13/27] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
@ 2020-02-21 12:02   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 12:02 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 484 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> When dealing with subcluster types there is a new value called
> QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
> QCow2ClusterType.
> 
> This patch handles that value in all places where subcluster types
> are processed.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State
  2020-02-20 16:48       ` Eric Blake
@ 2020-02-21 13:14         ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-21 13:14 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 05:48:25 PM CET, Eric Blake wrote:
>>> The qcow2 spec changes earlier in the series made it sound like your
>>> choices are exactly 1 or 32,
>> 
>>>> +#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
>>>> +
>>>
>>> ...but this name sounds like other values (2, 4, 8, 16) might be
>>> possible?
>> 
>> I guess I didn't want to call it QCOW_SUBCLUSTERS_PER_CLUSTER because
>> there's already BDRVQcow2State.subclusters_per_cluster. And that one can
>> have two possible values (1 and 32) so 32 would be the maximum.
>> 
>> I get your point, however, and I'm open to suggestions.
>
> Maybe QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER
>
> since it is a hard-coded property of the EXTL2 feature.

Sounds good.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 14/27] qcow2: Add subcluster support to calculate_l2_meta()
  2019-12-22 11:36 ` [RFC PATCH v3 14/27] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
@ 2020-02-21 13:34   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 13:34 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1693 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> If an image has subclusters then there are more copy-on-write
> scenarios that we need to consider. Let's say we have a write request
> from the middle of subcluster #3 until the end of the cluster:
> 
>    - If the cluster is new, then subclusters #0 to #3 from the old
>      cluster must be copied into the new one.
> 
>    - If the cluster is new but the old cluster was unallocated, then
>      only subcluster #3 needs copy-on-write. #0 to #2 are marked as
>      unallocated in the bitmap of the new L2 entry.
> 
>    - If we are overwriting an old cluster and subcluster #3 is
>      unallocated or has the all-zeroes bit set then we need
>      copy-on-write on subcluster #3.
> 
>    - If we are overwriting an old cluster and subcluster #3 was
>      allocated then there is no need to copy-on-write.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 140 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 110 insertions(+), 30 deletions(-)

It’s all a bit tough to wrap my head around.  One thing I got
particularly hung up is how we ensure that for new clusters the
head/tail subcluster bits that do not need COW are initialized to the
correct value.  Then I realized that we just have to keep them as they
are (unallocated or zero, respectively), because this path is only for
when we already have L2 entries, it’s just that they point to normal
non-COPIED clusters.  (So only the L2 offset entry has to be changed,
not the bitmap.  At least not for the subclusters that aren’t touched by
the write.)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2020-02-20 16:27   ` Max Reitz
@ 2020-02-21 13:57     ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-21 13:57 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 05:27:28 PM CET, Max Reitz wrote:
>> +static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
>> +                                     int idx)
>> +{
>> +    if (has_subclusters(s)) {
>> +        idx *= l2_entry_size(s) / sizeof(uint64_t);
>> +        return be64_to_cpu(l2_slice[idx + 1]);
>> +    } else {
>> +        /* For convenience only; the caller should ignore this value. */
>> +        return 0;
>
> Is there a reason you decided not to return the first subcluster as
> allocated?  (As you had proposed in v2)

Yeah, I thought that it would not make much sense to return a meaningful
value after a comment saying that the caller should ignore it.

If there was a situation in which something depends on that value then
it would be a bug in QEMU.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 15/27] qcow2: Add subcluster support to qcow2_get_cluster_offset()
  2019-12-22 11:36 ` [RFC PATCH v3 15/27] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
@ 2020-02-21 14:21   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 14:21 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 581 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> The logic of this function remains pretty much the same, except that
> it uses count_contiguous_subclusters(), which combines the logic of
> count_contiguous_clusters() / count_contiguous_clusters_unallocated()
> and checks individual subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 136 ++++++++++++++++++++----------------------
>  block/qcow2.h         |  36 +++++------
>  2 files changed, 80 insertions(+), 92 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 16/27] qcow2: Add subcluster support to zero_in_l2_slice()
  2019-12-22 11:36 ` [RFC PATCH v3 16/27] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
@ 2020-02-21 14:37   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 14:37 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 407 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
> image has subclusters. Instead, the individual 'all zeroes' bits must
> be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 17/27] qcow2: Add subcluster support to discard_in_l2_slice()
  2019-12-22 11:36 ` [RFC PATCH v3 17/27] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
@ 2020-02-21 14:45   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 14:45 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 396 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
> image has subclusters. Instead, the individual 'all zeroes' bits must
> be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 18/27] qcow2: Add subcluster support to check_refcounts_l2()
  2019-12-22 11:36 ` [RFC PATCH v3 18/27] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
@ 2020-02-21 14:47   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 14:47 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 401 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
> image has subclusters. Instead, the individual 'all zeroes' bits must
> be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-refcount.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2019-12-22 11:37 ` [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
@ 2020-02-21 14:57   ` Max Reitz
  2020-02-26 17:19     ` Alberto Garcia
  0 siblings, 1 reply; 80+ messages in thread
From: Max Reitz @ 2020-02-21 14:57 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 2328 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> Two changes are needed in order to add subcluster support to this
> function: deallocated clusters must have their bitmaps cleared, and
> expanded clusters must have all the "subcluster allocated" bits set.

Not really, to have real subcluster support it would need to be
expand_zero_subclusters_in_l1().  Right now it can only deal with full
zero clusters, which will actually never happen for images with subclusters.

As noted in v2, this function is only called when downgrading qcow2
images to v2.  It kind of made sense to just call set_l2_bitmap() in v2,
but now with the if () conditional...  I suppose it may make more sense
to assert that the image does not have subclusters at the beginning of
the function and be done with it.

OTOH, well, this does make ensuring that we have subcluster “support”
everywhere a bit easier because this way all set_l2_entry() calls are
accompanied by an “if (subclusters) { set_l2_bitmap() }” part.

But it is dead code.

Max

> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 207f670c94..ede75138d2 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -2054,6 +2054,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>                          /* not backed; therefore we can simply deallocate the
>                           * cluster */
>                          set_l2_entry(s, l2_slice, j, 0);
> +                        if (has_subclusters(s)) {
> +                            set_l2_bitmap(s, l2_slice, j, 0);
> +                        }
>                          l2_dirty = true;
>                          continue;
>                      }
> @@ -2120,6 +2123,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>                  } else {
>                      set_l2_entry(s, l2_slice, j, offset);
>                  }
> +                if (has_subclusters(s)) {
> +                    set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
> +                }
>                  l2_dirty = true;
>              }
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 20/27] qcow2: Fix offset calculation in handle_dependencies()
  2019-12-22 11:37 ` [RFC PATCH v3 20/27] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
@ 2020-02-21 15:01   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 15:01 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 477 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> l2meta_cow_start() and l2meta_cow_end() are not necessarily
> cluster-aligned if the image has subclusters, so update the
> calculation of old_start and old_end to guarantee that no two requests
> try to write on the same cluster.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  2020-02-21 11:35   ` Max Reitz
@ 2020-02-21 15:14     ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-21 15:14 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Fri 21 Feb 2020 12:35:55 PM CET, Max Reitz wrote:
>> @@ -2223,22 +2227,23 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
>>          }
>>  
>>          qemu_co_mutex_lock(&s->lock);
>> -        ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
>> +        ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes,
>> +                                       &cluster_offset, &type);
>
> I wonder whether this is kind of a bug fix here.  It’s entirely possible
> that @ret isn’t set after this, and then we get to the “out” label,
> which has a check on “if (ret == 0)”.

I think that in order to get to "if (ret == 0)" you would first need to
run aio_task_pool_new(), and that codepath guarantees that @ret is set.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 21/27] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2019-12-22 11:37 ` [RFC PATCH v3 21/27] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
@ 2020-02-21 15:43   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 15:43 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 2147 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> The L2 bitmap needs to be updated after each write to indicate what
> new subclusters are now allocated.
> 
> This needs to happen even if the cluster was already allocated and the
> L2 entry was otherwise valid.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 0a40944667..ed291a4042 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -986,6 +986,23 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>  
>          set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
>                       (cluster_offset + (i << s->cluster_bits)));
> +
> +        /* Update bitmap with the subclusters that were just written */
> +        if (has_subclusters(s)) {
> +            unsigned written_from = m->cow_start.offset;
> +            unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
> +                m->nb_clusters << s->cluster_bits;

I suppose we could also calculate both at the beginning of the function
(I’m not sure whether the compiler can optimize these calculations to
happen only once if we don’t).

> +            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
> +            int sc;
> +            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
> +                int sc_off = i * s->cluster_size + sc * s->subcluster_size;
> +                if (sc_off >= written_from && sc_off < written_to) {
> +                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
> +                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);

Works, but maybe a QCOW_OFLAG_SUB_MASK(sc) would be better for:

l2_bitmap &= ~QCOW_OFLAG_SUB_MASK(sc);
l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);

Nothing wrong though, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +                }
> +            }
> +            set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
> +        }
>       }
>  
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 22/27] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2019-12-22 11:37 ` [RFC PATCH v3 22/27] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
@ 2020-02-21 15:46   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 15:46 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 312 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> Compressed clusters always have the bitmap part of the extended L2
> entry set to 0.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 23/27] qcow2: Add subcluster support to handle_alloc_space()
  2019-12-22 11:37 ` [RFC PATCH v3 23/27] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
@ 2020-02-21 15:56   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 15:56 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 751 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> The bdrv_co_pwrite_zeroes() call here fills complete clusters with
> zeroes, but it can happen that some subclusters are not part of the
> write request or the copy-on-write. This patch makes sure that only
> the affected subclusters are overwritten.
> 
> A potential improvement would be to also fill with zeroes the other
> subclusters if we can guarantee that we are not overwriting existing
> data. However this would waste more disk space, so we should first
> evaluate if it's really worth doing.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 24/27] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  2019-12-22 11:37 ` [RFC PATCH v3 24/27] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
@ 2020-02-21 16:02   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 16:02 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 345 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> Ideally it should be possible to zero individual subclusters using
> this function, but this is currently not implemented.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 6 ++++++
>  1 file changed, 6 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-12-22 11:37 ` [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
  2020-02-20 14:12   ` Eric Blake
@ 2020-02-21 16:44   ` Max Reitz
  1 sibling, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 16:44 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1861 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> Now that the implementation of subclusters is complete we can finally
> add the necessary options to create and read images with this feature,
> which we call "extended L2 entries".
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c                    |  65 ++++++++++++++++++--
>  block/qcow2.h                    |   8 ++-
>  include/block/block_int.h        |   1 +
>  qapi/block-core.json             |   7 +++
>  tests/qemu-iotests/031.out       |   8 +--
>  tests/qemu-iotests/036.out       |   4 +-
>  tests/qemu-iotests/049.out       | 102 +++++++++++++++----------------
>  tests/qemu-iotests/060.out       |   1 +
>  tests/qemu-iotests/061.out       |  20 +++---
>  tests/qemu-iotests/065           |  18 ++++--
>  tests/qemu-iotests/082.out       |  48 ++++++++++++---
>  tests/qemu-iotests/085.out       |  38 ++++++------
>  tests/qemu-iotests/144.out       |   4 +-
>  tests/qemu-iotests/182.out       |   2 +-
>  tests/qemu-iotests/185.out       |   8 +--
>  tests/qemu-iotests/198.out       |   2 +
>  tests/qemu-iotests/206.out       |   4 ++
>  tests/qemu-iotests/242.out       |   5 ++
>  tests/qemu-iotests/255.out       |   8 +--
>  tests/qemu-iotests/273.out       |   9 ++-
>  tests/qemu-iotests/common.filter |   1 +
>  21 files changed, 245 insertions(+), 118 deletions(-)

With the .qapi versions adjusted to match $next_release, and with the
bit fixed to be at index 4 instead of 3 (and with the iotests rebases
that always become necessary[1]):

Reviewed-by: Max Reitz <mreitz@redhat.com>

[1] e.g. 280 fails now – I suppose qemu_img_log should filter just like
the bash tests do, but then again, I’d rather drop that function
altogether anyway
(https://lists.nongnu.org/archive/html/qemu-block/2019-10/msg00136.html)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure()
  2019-12-22 11:37 ` [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
@ 2020-02-21 16:52   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 16:52 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 382 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> Extended L2 entries are bigger than normal L2 entries so this has an
> impact on the amount of metadata needed for a qcow2 file.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries
  2019-12-22 11:37 ` [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
@ 2020-02-21 17:04   ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-21 17:04 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1027 bytes --]

On 22.12.19 12:37, Alberto Garcia wrote:
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  tests/qemu-iotests/271     | 256 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/271.out | 208 ++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 465 insertions(+)
>  create mode 100755 tests/qemu-iotests/271
>  create mode 100644 tests/qemu-iotests/271.out

Currently, you’re using the reference output to verify the results.  I
find this rather difficult.

Can this not be written in a way that the test itself verifies the
results?  I realize bit manipulation in bash is hard, which is why I
wonder whether Python may be better suited for the job.

Or maybe at least there could be some way to produce a hexdump-like
result from some more abstract description on what to expect and then
compare the strings.

I suppose I can live with how it is, but I feel like I’d have to do
something in my head that could be better done by a script.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 00/27] Add subcluster allocation to qcow2
  2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (26 preceding siblings ...)
  2019-12-22 11:37 ` [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
@ 2020-02-21 17:10 ` Max Reitz
  2020-02-22 17:59   ` Alberto Garcia
  27 siblings, 1 reply; 80+ messages in thread
From: Max Reitz @ 2020-02-21 17:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1390 bytes --]

On 22.12.19 12:36, Alberto Garcia wrote:
> Hi,
> 
> here's the new version of the patches to add subcluster allocation
> support to qcow2.
> 
> Please refer to the cover letter of the first version for a full
> description of the patches:
> 
>    https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html
> 
> This version fixes many of the problems highlighted by Max. I decided
> not to replace completely the cluster logic with subcluster logic in
> all cases because I felt that sometimes it only complicated the code.
> Let's see what you think :-)

Looks good overall. :)

So now I wonder on what your plans are after this series.  Here are some
things that come to my mind, and I wonder whether you plan to address
them or whether there are more things to do still:

- In v2, you had a patch for preallocation support with backing files.
It didn’t quite work, which is why I think you dropped it for now (why
not, it isn’t crucial).

- There is a TODO on subcluster zeroing.

- I think adding support to amend for switching extended_l2 on or off
would make sense.  But maybe it’s too complicated to be worth the effort.

- As I noted in v2, I think it’d be great if it were possible to run the
iotests with -o extended_l2=on.  But I suppose this kind of depends on
me adding data_file support to the Python tests first...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 484 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 00/27] Add subcluster allocation to qcow2
  2020-02-21 17:10 ` [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Max Reitz
@ 2020-02-22 17:59   ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-22 17:59 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Fri 21 Feb 2020 06:10:52 PM CET, Max Reitz wrote:

> So now I wonder on what your plans are after this series.

Apart from some fixes here and there, there are some things that I would
live to solve:

- I'm not 100% happy with the separation between QCow2ClusterType and
  QCow2SubclusterType. The former is a strict subset of the latter and
  doesn't carry any additional information, so I think there's no need
  to have both in the code. So I'm thinking to get rid of
  QCow2ClusterType altogether.

- We discussed this already, and related to the previous point, in most
  places where the (sub)cluster type is checked what we want to know is
  whether there is a valid host address, or whether the data reads as
  zeroes, etc. So one possibility is to make qcow2_get_subcluster_type()
  return status flags like the existing BDRV_BLOCK_DATA,
  BDRV_BLOCK_OFFSET_VALID, ... and check those ones instead. Some
  functions become less verbose with this kind of approach, but I'm not
  sure that it works so well with others.

- We also discussed this already, but qcow2_get_cluster_offset() returns
  an offset to the beginning of the cluster. This makes less sense when
  we start working at the subcluster level, but even at the moment the
  reality is that no one uses that offset. All callers use the final
  unaligned host offset. So I have a few patches that change that.

> Here are some things that come to my mind, and I wonder whether you
> plan to address them or whether there are more things to do still:
>
> - In v2, you had a patch for preallocation support with backing files.
> It didn’t quite work, which is why I think you dropped it for now (why
> not, it isn’t crucial).

There was already a problem with preallocation and backing files (
https://lists.gnu.org/archive/html/qemu-block/2019-11/msg00691.html ) so
I decided to withdraw the patches for subclusters and reevaluate the
situation when that was sorted out.

> - There is a TODO on subcluster zeroing.

I'm not sure if I'll fix that now, but I'll give it a try when we all
are happy with the rest the patches and the general design.

> - I think adding support to amend for switching extended_l2 on or off
> would make sense.  But maybe it’s too complicated to be worth the
> effort.

I haven't thought about that, but it does sound too complicated to be
worth it.

> - As I noted in v2, I think it’d be great if it were possible to run
> the iotests with -o extended_l2=on.  But I suppose this kind of
> depends on me adding data_file support to the Python tests first...

Yes.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature
  2020-02-20 15:16       ` Eric Blake
@ 2020-02-26 16:57         ` Alberto Garcia
  0 siblings, 0 replies; 80+ messages in thread
From: Alberto Garcia @ 2020-02-26 16:57 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 20 Feb 2020 04:16:12 PM CET, Eric Blake wrote:

>>>> +In these images standard data clusters are divided into 32 subclusters of the
>>>> +same size. They are contiguous and start from the beginning of the cluster.
>>>> +Subclusters can be allocated independently and the L2 entry contains information
>>>> +indicating the status of each one of them. Compressed data clusters don't have
>>>> +subclusters so they are treated like in images without this feature.
>>>
>>> Grammar; I'd suggest:
>>>
>>> ...don't have subclusters, so they are treated the same as in images
>>> without this feature.
>> 
>> Ok
>> 
>>> Are they truly the same, or do you still need to document that the
>>> extra 64 bits of the extended L2 entry are all zero?
>> 
>> It is documented later in the same patch ("Subcluster Allocation
>> Bitmap for compressed clusters").
>
> Yes, I saw the mention later.  I'm just wondering if we need to
> rearrange text to mention that the bits are reserved (set to 0, ignore
> on read) closer to the point where we document compressed clusters
> have no subclusters.

When I say that "compressed data clusters are treated the same as in
images without this feature" I mean that there are no semantic
changes. I don't think it's necessary to add anything else considering
that the sentence immediately after that one says that the L2 entry size
is now 128 bits, so it's not hard to guess that compressed cluster
descriptors must somehow be affected by this.

> We have 8 potential combinations (not all make sense):
>
> host   zero alloc
>    0      0    0     cluster unallocated, subcluster defers to backing
>    0      0    1     error (except maybe for external data file)

Correct (without the 'maybe')

>    0      1    0     cluster unallocated, subcluster reads as zero
>    0      1    1     error (except maybe for external data file)

This is an error in all cases.

>   addr    0    0     cluster allocated, subcluster defers to backing
>   addr    0    1     cluster allocated, subcluster reads from host
>   addr    1    0     cluster allocated, subcluster reads as zero
>   addr    1    1   error, or cluster allocated, subcluster reads as zero

The last one is also an error.

> Hmm - normally addr is non-zero (because the 0 addr is the metadata 
> cluster of qcow2), but with external data file, host addr 0 is required 
> for guest offset 0. How do subclusters play with external data files?

No difference:

    /* ... */ if (!(l2_entry & L2E_OFFSET_MASK)) {
        /* Offset 0 generally means unallocated, but it is ambiguous with
         * external data files because 0 is a valid offset there. However, all
         * clusters in external data files always have refcount 1, so we can
         * rely on QCOW_OFLAG_COPIED to disambiguate. */
        if (has_data_file(bs) && (l2_entry & QCOW_OFLAG_COPIED)) {
            return QCOW2_CLUSTER_NORMAL;
        } else {
            return QCOW2_CLUSTER_UNALLOCATED;
        }
    } /* ... */

This code doesn't change if there are subclusters, and is still used to
determine whether a cluster is allocated or not, and therefore whether
the subcluster allocation bits need to be checked or not.

> It makes sense to still have subclusters read as 0 or defer to backing
> with an external file (except maybe when raw external file is set).
> But you did word it as if the alloc bit is set, the "host cluster
> offset field must contain a valid offset" which includes an offset of
> 0 for external data file.

Yes, that is possible with subclusters (unless there's a bug).

> If we mandate 10 for the reads-as-zero form, then whether addr is
> valid is irrelevant. If we mandate 11 for the reads-as-zero form, then
> addr must be valid even though we don't reference addr.  Having
> written all that, I agree that either form should work, but also that
> mandating one form leaves the door open for a future extension to
> define meaning to the form we did not permit (that is, either 10 or 11
> becomes a reserved pattern that we can later give meaning to),
> vs. allowing both forms now and locking ourselves out of a future
> meaning.  And mandating addr to be valid even when reading zeroes
> doesn't use addr feels odd.

Yes, we definitely don't want to make 10 and 11 synonymous. One of them
should return an error and maybe in the future we can think of a new
meaning.

> So, I'm okay with your choice of picking 00, 01, and 10 as the
> mandated forms, and declaring 11 as invalid for now (but a possible
> future extension).  Maybe I'll change my mind when seeing what
> complexity it adds to the qcow2 reference implementation, but
> hopefully not.

From the implementation point of view there's no difference in
complexity.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2020-02-21 14:57   ` Max Reitz
@ 2020-02-26 17:19     ` Alberto Garcia
  2020-02-27  9:17       ` Max Reitz
  0 siblings, 1 reply; 80+ messages in thread
From: Alberto Garcia @ 2020-02-26 17:19 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Fri 21 Feb 2020 03:57:27 PM CET, Max Reitz wrote:
> As noted in v2, this function is only called when downgrading qcow2
> images to v2.  It kind of made sense to just call set_l2_bitmap() in
> v2, but now with the if () conditional...  I suppose it may make more
> sense to assert that the image does not have subclusters at the
> beginning of the function and be done with it.

Hmmm, you're right.

> OTOH, well, this does make ensuring that we have subcluster “support”
> everywhere a bit easier because this way all set_l2_entry() calls are
> accompanied by an “if (subclusters) { set_l2_bitmap() }” part.

Another alternative is to assert that the image does not have subcluster
but still leave a comment after both set_l2_entry() calls explaining why
there's no need to touch the bitmap.

I think I'll do that, unless you have a different proposal.

Berto


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2020-02-26 17:19     ` Alberto Garcia
@ 2020-02-27  9:17       ` Max Reitz
  0 siblings, 0 replies; 80+ messages in thread
From: Max Reitz @ 2020-02-27  9:17 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 911 bytes --]

On 26.02.20 18:19, Alberto Garcia wrote:
> On Fri 21 Feb 2020 03:57:27 PM CET, Max Reitz wrote:
>> As noted in v2, this function is only called when downgrading qcow2
>> images to v2.  It kind of made sense to just call set_l2_bitmap() in
>> v2, but now with the if () conditional...  I suppose it may make more
>> sense to assert that the image does not have subclusters at the
>> beginning of the function and be done with it.
> 
> Hmmm, you're right.
> 
>> OTOH, well, this does make ensuring that we have subcluster “support”
>> everywhere a bit easier because this way all set_l2_entry() calls are
>> accompanied by an “if (subclusters) { set_l2_bitmap() }” part.
> 
> Another alternative is to assert that the image does not have subcluster
> but still leave a comment after both set_l2_entry() calls explaining why
> there's no need to touch the bitmap.

Sounds good.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2020-02-27  9:18 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-22 11:36 [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Alberto Garcia
2019-12-22 11:36 ` [RFC PATCH v3 01/27] qcow2: Add calculate_l2_meta() Alberto Garcia
2020-02-20 13:28   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 02/27] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
2020-02-20 13:32   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 03/27] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
2020-02-20 14:53   ` Eric Blake
2020-02-20 15:00   ` Max Reitz
2020-02-20 15:19   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 04/27] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
2020-02-20 15:22   ` Eric Blake
2020-02-20 16:08     ` Alberto Garcia
2020-02-20 15:39   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 05/27] qcow2: Document the Extended L2 Entries feature Alberto Garcia
2020-02-20 14:28   ` Eric Blake
2020-02-20 14:49     ` Alberto Garcia
2020-02-20 15:16       ` Eric Blake
2020-02-26 16:57         ` Alberto Garcia
2020-02-20 14:33   ` Eric Blake
2020-02-20 16:10     ` Alberto Garcia
2020-02-20 15:54   ` Max Reitz
2020-02-20 16:02     ` Eric Blake
2020-02-20 16:04       ` Alberto Garcia
2019-12-22 11:36 ` [RFC PATCH v3 06/27] qcow2: Add dummy has_subclusters() function Alberto Garcia
2020-02-20 15:24   ` Eric Blake
2020-02-20 16:03   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 07/27] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
2020-02-20 15:28   ` Eric Blake
2020-02-20 16:34     ` Alberto Garcia
2020-02-20 16:48       ` Eric Blake
2020-02-21 13:14         ` Alberto Garcia
2020-02-20 16:15   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 08/27] qcow2: Add offset_to_sc_index() Alberto Garcia
2020-02-20 16:19   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 09/27] qcow2: Add l2_entry_size() Alberto Garcia
2020-02-20 16:24   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 10/27] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
2020-02-20 16:27   ` Max Reitz
2020-02-21 13:57     ` Alberto Garcia
2019-12-22 11:36 ` [RFC PATCH v3 11/27] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
2020-02-20 17:21   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 12/27] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
2020-02-21 11:35   ` Max Reitz
2020-02-21 15:14     ` Alberto Garcia
2019-12-22 11:36 ` [RFC PATCH v3 13/27] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
2020-02-21 12:02   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 14/27] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
2020-02-21 13:34   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 15/27] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
2020-02-21 14:21   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 16/27] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
2020-02-21 14:37   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 17/27] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
2020-02-21 14:45   ` Max Reitz
2019-12-22 11:36 ` [RFC PATCH v3 18/27] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
2020-02-21 14:47   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 19/27] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
2020-02-21 14:57   ` Max Reitz
2020-02-26 17:19     ` Alberto Garcia
2020-02-27  9:17       ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 20/27] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
2020-02-21 15:01   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 21/27] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
2020-02-21 15:43   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 22/27] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
2020-02-21 15:46   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 23/27] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
2020-02-21 15:56   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 24/27] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
2020-02-21 16:02   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 25/27] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
2020-02-20 14:12   ` Eric Blake
2020-02-20 14:16     ` Alberto Garcia
2020-02-21 16:44   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 26/27] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
2020-02-21 16:52   ` Max Reitz
2019-12-22 11:37 ` [RFC PATCH v3 27/27] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
2020-02-21 17:04   ` Max Reitz
2020-02-21 17:10 ` [RFC PATCH v3 00/27] Add subcluster allocation to qcow2 Max Reitz
2020-02-22 17:59   ` Alberto Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.