QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH v2 00/26] Add subcluster allocation to qcow2
@ 2019-10-26 21:25 Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta() Alberto Garcia
                   ` (26 more replies)
  0 siblings, 27 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

This version includes a few tests, but I'm planning to add more for
the next revision.

Berto

v2:
- Patch 12: Update after the changes in 88f468e546.
- Patch 21 (new): Clear the L2 bitmap when allocating a compressed
  cluster. Compressed clusters should have the bitmap all set to 0.
- Patch 24: Document the new fields in the QAPI documentation [Eric].
- Patch 25: Allow qcow2 preallocation with backing files.
- Patch 26: Add some tests for qcow2 images with extended L2 entries.

v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/26:[----] [-C] 'qcow2: Add calculate_l2_meta()'
002/26:[----] [--] 'qcow2: Split cluster_needs_cow() out of count_cow_clusters()'
003/26:[----] [--] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()'
004/26:[----] [--] 'qcow2: Add get_l2_entry() and set_l2_entry()'
005/26:[----] [--] 'qcow2: Document the Extended L2 Entries feature'
006/26:[----] [--] 'qcow2: Add dummy has_subclusters() function'
007/26:[----] [--] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
008/26:[----] [--] 'qcow2: Add offset_to_sc_index()'
009/26:[----] [--] 'qcow2: Add l2_entry_size()'
010/26:[----] [--] 'qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()'
011/26:[----] [--] 'qcow2: Add qcow2_get_subcluster_type()'
012/26:[0005] [FC] 'qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER'
013/26:[----] [--] 'qcow2: Add subcluster support to calculate_l2_meta()'
014/26:[----] [--] 'qcow2: Add subcluster support to qcow2_get_cluster_offset()'
015/26:[----] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
016/26:[----] [--] 'qcow2: Add subcluster support to discard_in_l2_slice()'
017/26:[----] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
018/26:[----] [--] 'qcow2: Add subcluster support to expand_zero_clusters_in_l1()'
019/26:[----] [--] 'qcow2: Fix offset calculation in handle_dependencies()'
020/26:[----] [--] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
021/26:[down] 'qcow2: Clear the L2 bitmap when allocating a compressed cluster'
022/26:[----] [--] 'qcow2: Add subcluster support to handle_alloc_space()'
023/26:[----] [--] 'qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only'
024/26:[0007] [FC] 'qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit'
025/26:[down] 'qcow2: Allow preallocation and backing files if extended_l2 is set'
026/26:[down] 'iotests: Add tests for qcow2 images with extended L2 entries'

Alberto Garcia (26):
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: Add offset_to_sc_index()
  qcow2: Add l2_entry_size()
  qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  qcow2: Add qcow2_get_subcluster_type()
  qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  qcow2: Add subcluster support to calculate_l2_meta()
  qcow2: Add subcluster support to qcow2_get_cluster_offset()
  qcow2: Add subcluster support to zero_in_l2_slice()
  qcow2: Add subcluster support to discard_in_l2_slice()
  qcow2: Add subcluster support to check_refcounts_l2()
  qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  qcow2: Fix offset calculation in handle_dependencies()
  qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  qcow2: Clear the L2 bitmap when allocating a compressed cluster
  qcow2: Add subcluster support to handle_alloc_space()
  qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  qcow2: Allow preallocation and backing files if extended_l2 is set
  iotests: Add tests for qcow2 images with extended L2 entries

 block/qcow2-cluster.c            | 548 ++++++++++++++++++++-----------
 block/qcow2-refcount.c           |  38 ++-
 block/qcow2.c                    |  92 +++++-
 block/qcow2.h                    | 121 ++++++-
 docs/interop/qcow2.txt           |  68 +++-
 docs/qcow2-cache.txt             |  19 +-
 include/block/block_int.h        |   1 +
 qapi/block-core.json             |   6 +
 tests/qemu-iotests/031.out       |   8 +-
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 +++---
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  20 +-
 tests/qemu-iotests/065           |  18 +-
 tests/qemu-iotests/082.out       |  48 ++-
 tests/qemu-iotests/085.out       |  38 +--
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +-
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   6 +-
 tests/qemu-iotests/242.out       |   5 +
 tests/qemu-iotests/255.out       |   8 +-
 tests/qemu-iotests/271           | 235 +++++++++++++
 tests/qemu-iotests/271.out       | 183 +++++++++++
 tests/qemu-iotests/common.filter |   1 +
 tests/qemu-iotests/group         |   1 +
 27 files changed, 1247 insertions(+), 340 deletions(-)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-28 12:50   ` Vladimir Sementsov-Ogievskiy
  2019-10-30 12:04   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 02/26] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
                   ` (25 subsequent siblings)
  26 siblings, 2 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 76 +++++++++++++++++++++++++++++--------------
 1 file changed, 52 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8982b7b762..6c1dcdc781 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1019,6 +1019,55 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
                         QCOW2_DISCARD_NEVER);
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure and
+ * add it to @m.
+ *
+ * @host_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
+                              uint64_t guest_offset, uint64_t bytes,
+                              QCowL2Meta **m, bool keep_old)
+{
+    BDRVQcow2State *s = bs->opaque;
+    unsigned cow_start_from = 0;
+    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+    unsigned cow_end_from = cow_start_to + bytes;
+    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+    QCowL2Meta *old_m = *m;
+
+    *m = g_malloc0(sizeof(**m));
+    **m = (QCowL2Meta) {
+        .next           = old_m,
+
+        .alloc_offset   = host_offset,
+        .offset         = start_of_cluster(s, guest_offset),
+        .nb_clusters    = nb_clusters,
+
+        .keep_old_clusters = keep_old,
+
+        .cow_start = {
+            .offset     = cow_start_from,
+            .nb_bytes   = cow_start_to - cow_start_from,
+        },
+        .cow_end = {
+            .offset     = cow_end_from,
+            .nb_bytes   = cow_end_to - cow_end_from,
+        },
+    };
+
+    qemu_co_queue_init(&(*m)->dependent_requests);
+    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1417,35 +1466,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
     int avail_bytes = nb_clusters << s->cluster_bits;
     int nb_bytes = MIN(requested_bytes, avail_bytes);
-    QCowL2Meta *old_m = *m;
-
-    *m = g_malloc0(sizeof(**m));
-
-    **m = (QCowL2Meta) {
-        .next           = old_m,
-
-        .alloc_offset   = alloc_cluster_offset,
-        .offset         = start_of_cluster(s, guest_offset),
-        .nb_clusters    = nb_clusters,
-
-        .keep_old_clusters  = keep_old_clusters,
-
-        .cow_start = {
-            .offset     = 0,
-            .nb_bytes   = offset_into_cluster(s, guest_offset),
-        },
-        .cow_end = {
-            .offset     = nb_bytes,
-            .nb_bytes   = avail_bytes - nb_bytes,
-        },
-    };
-    qemu_co_queue_init(&(*m)->dependent_requests);
-    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 
     *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                      m, keep_old_clusters);
+
     return 1;
 
 fail:
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 02/26] qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-28 13:55   ` Vladimir Sementsov-Ogievskiy
  2019-10-26 21:25 ` [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

We are going to need it in other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 6c1dcdc781..aa1010a515 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1068,6 +1068,24 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+    switch (qcow2_get_cluster_type(bs, l2_entry)) {
+    case QCOW2_CLUSTER_NORMAL:
+        if (l2_entry & QCOW_OFLAG_COPIED) {
+            return false;
+        }
+    case QCOW2_CLUSTER_UNALLOCATED:
+    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_CLUSTER_ZERO_PLAIN:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        return true;
+    default:
+        abort();
+    }
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1080,25 +1098,11 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
 
     for (i = 0; i < nb_clusters; i++) {
         uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-        switch(cluster_type) {
-        case QCOW2_CLUSTER_NORMAL:
-            if (l2_entry & QCOW_OFLAG_COPIED) {
-                goto out;
-            }
+        if (!cluster_needs_cow(bs, l2_entry)) {
             break;
-        case QCOW2_CLUSTER_UNALLOCATED:
-        case QCOW2_CLUSTER_COMPRESSED:
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
-            break;
-        default:
-            abort();
         }
     }
 
-out:
     assert(i <= nb_clusters);
     return i;
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta() Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 02/26] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-30 14:24   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 04/26] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
                   ` (23 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
     allocated and have a reference count of 1. In those clusters we
     can simply write the data and there is no need to perform any
     copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
     either because they haven't been allocated yet, because their
     reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
     which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
     reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
     allocation (all other cases).

One importante difference after this change is that clusters found in
handle_copied() may now require copy-on-write, but this will be anyway
necessary once we add support for subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 177 +++++++++++++++++++++++-------------------
 1 file changed, 96 insertions(+), 81 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index aa1010a515..ee6b46f917 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1021,7 +1021,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure and
- * add it to @m.
+ * add it to @m. If the write request does not need copy-on-write or
+ * changes to the L2 metadata then this function does nothing.
  *
  * @host_offset points to the beginning of the first cluster.
  *
@@ -1034,15 +1035,51 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
  */
 static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
                               uint64_t guest_offset, uint64_t bytes,
-                              QCowL2Meta **m, bool keep_old)
+                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned cow_start_from = 0;
+    int l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry;
+    unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
-    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
     unsigned nb_clusters = size_to_clusters(s, cow_end_from);
     QCowL2Meta *old_m = *m;
+    QCow2ClusterType type;
+
+    /* Return if there's no COW (all clusters are normal and we keep them) */
+    if (keep_old) {
+        int i;
+        for (i = 0; i < nb_clusters; i++) {
+            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+                break;
+            }
+        }
+        if (i == nb_clusters) {
+            return;
+        }
+    }
+
+    /* Get the L2 entry from the first cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_start_from = cow_start_to;
+    } else {
+        cow_start_from = 0;
+    }
+
+    /* Get the L2 entry from the last cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_end_to = cow_end_from;
+    } else {
+        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    }
 
     *m = g_malloc0(sizeof(**m));
     **m = (QCowL2Meta) {
@@ -1068,18 +1105,18 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
-/* Returns true if writing to a cluster requires COW */
+/* Returns true if the cluster is unallocated or has refcount > 1 */
 static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
 {
     switch (qcow2_get_cluster_type(bs, l2_entry)) {
     case QCOW2_CLUSTER_NORMAL:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
         if (l2_entry & QCOW_OFLAG_COPIED) {
             return false;
         }
     case QCOW2_CLUSTER_UNALLOCATED:
     case QCOW2_CLUSTER_COMPRESSED:
     case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_ZERO_ALLOC:
         return true;
     default:
         abort();
@@ -1087,20 +1124,34 @@ static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
 }
 
 /*
- * Returns the number of contiguous clusters that can be used for an allocating
- * write, but require COW to be performed (this includes yet unallocated space,
- * which must copy from the backing file)
+ * Returns the number of contiguous clusters that can be written to
+ * using one single write request, starting from @l2_index.
+ * At most @nb_clusters are checked.
+ *
+ * If @want_cow is true this counts clusters that are either
+ * unallocated, or allocated but with refcount > 1.
+ *
+ * If @want_cow is false this counts clusters that are already
+ * allocated and can be written to using their current locations
+ * (including QCOW2_CLUSTER_ZERO_ALLOC).
  */
 static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
-    uint64_t *l2_slice, int l2_index)
+                              uint64_t *l2_slice, int l2_index, bool want_cow)
 {
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        if (!cluster_needs_cow(bs, l2_entry)) {
+        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        if (cluster_needs_cow(bs, l2_entry) != want_cow) {
             break;
         }
+        if (!want_cow && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+            break;
+        }
+        expected_offset += s->cluster_size;
     }
 
     assert(i <= nb_clusters);
@@ -1228,18 +1279,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
 
     cluster_offset = be64_to_cpu(l2_slice[l2_index]);
 
-    /* Check how many clusters are already allocated and don't need COW */
-    if (qcow2_get_cluster_type(bs, cluster_offset) == QCOW2_CLUSTER_NORMAL
-        && (cluster_offset & QCOW_OFLAG_COPIED))
-    {
+    if (!cluster_needs_cow(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
         bool offset_matches =
             (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
 
         if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
+            qcow2_signal_corruption(bs, true, -1, -1, "%s cluster offset "
                                     "%#llx unaligned (guest offset: %#" PRIx64
-                                    ")", cluster_offset & L2E_OFFSET_MASK,
+                                    ")", cluster_offset & QCOW_OFLAG_ZERO ?
+                                    "Preallocated zero" : "Data",
+                                    cluster_offset & L2E_OFFSET_MASK,
                                     guest_offset);
             ret = -EIO;
             goto out;
@@ -1252,15 +1302,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         }
 
         /* We keep all QCOW_OFLAG_COPIED clusters */
-        keep_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index],
-                                      QCOW_OFLAG_COPIED | QCOW_OFLAG_ZERO);
+        keep_clusters = count_cow_clusters(bs, nb_clusters, l2_slice,
+                                           l2_index, false);
         assert(keep_clusters <= nb_clusters);
 
         *bytes = MIN(*bytes,
                  keep_clusters * s->cluster_size
                  - offset_into_cluster(s, guest_offset));
+        assert(*bytes != 0);
+
+        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
+                          *bytes, l2_slice, m, true);
 
         ret = 1;
     } else {
@@ -1361,12 +1413,10 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     BDRVQcow2State *s = bs->opaque;
     int l2_index;
     uint64_t *l2_slice;
-    uint64_t entry;
     uint64_t nb_clusters;
     int ret;
-    bool keep_old_clusters = false;
 
-    uint64_t alloc_cluster_offset = INV_OFFSET;
+    uint64_t alloc_cluster_offset;
 
     trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
                              *bytes);
@@ -1392,67 +1442,31 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    entry = be64_to_cpu(l2_slice[l2_index]);
-    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index);
+    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index, true);
 
     /* This function is only called when there were no non-COW clusters, so if
      * we can't find any unallocated or COW clusters either, something is
      * wrong with our code. */
     assert(nb_clusters > 0);
 
-    if (qcow2_get_cluster_type(bs, entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
-        (entry & QCOW_OFLAG_COPIED) &&
-        (*host_offset == INV_OFFSET ||
-         start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
-    {
-        int preallocated_nb_clusters;
-
-        if (offset_into_cluster(s, entry & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Preallocated zero "
-                                    "cluster offset %#llx unaligned (guest "
-                                    "offset: %#" PRIx64 ")",
-                                    entry & L2E_OFFSET_MASK, guest_offset);
-            ret = -EIO;
-            goto fail;
-        }
-
-        /* Try to reuse preallocated zero clusters; contiguous normal clusters
-         * would be fine, too, but count_cow_clusters() above has limited
-         * nb_clusters already to a range of COW clusters */
-        preallocated_nb_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_COPIED);
-        assert(preallocated_nb_clusters > 0);
-
-        nb_clusters = preallocated_nb_clusters;
-        alloc_cluster_offset = entry & L2E_OFFSET_MASK;
-
-        /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
-         * should not free them. */
-        keep_old_clusters = true;
+    /* Allocate, if necessary at a given offset in the image file */
+    alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
+        start_of_cluster(s, *host_offset);
+    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
+                                  &nb_clusters);
+    if (ret < 0) {
+        goto out;
     }
 
-    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-
-    if (alloc_cluster_offset == INV_OFFSET) {
-        /* Allocate, if necessary at a given offset in the image file */
-        alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
-                               start_of_cluster(s, *host_offset);
-        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
-                                      &nb_clusters);
-        if (ret < 0) {
-            goto fail;
-        }
-
-        /* Can't extend contiguous allocation */
-        if (nb_clusters == 0) {
-            *bytes = 0;
-            return 0;
-        }
-
-        assert(alloc_cluster_offset != INV_OFFSET);
+    /* Can't extend contiguous allocation */
+    if (nb_clusters == 0) {
+        *bytes = 0;
+        ret = 0;
+        goto out;
     }
 
+    assert(alloc_cluster_offset != INV_OFFSET);
+
     /*
      * Save info needed for meta data update.
      *
@@ -1475,13 +1489,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
-                      m, keep_old_clusters);
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
+                      m, false);
 
-    return 1;
+    ret = 1;
 
-fail:
-    if (*m && (*m)->nb_clusters > 0) {
+out:
+    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
+    if (ret < 0 && *m && (*m)->nb_clusters > 0) {
         QLIST_REMOVE(*m, next_in_flight);
     }
     return ret;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 04/26] qcow2: Add get_l2_entry() and set_l2_entry()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (2 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature Alberto Garcia
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c  | 65 ++++++++++++++++++++++--------------------
 block/qcow2-refcount.c | 17 +++++------
 block/qcow2.h          | 12 ++++++++
 3 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ee6b46f917..581fa90ab1 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -379,12 +379,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
     QCow2ClusterType first_cluster_type;
     uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t offset = first_entry & mask;
 
     first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -397,7 +398,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
            first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
         if (offset + (uint64_t) i * cluster_size != l2_entry) {
             break;
         }
@@ -413,14 +414,16 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
                                                  int nb_clusters,
                                                  uint64_t *l2_slice,
+                                                 int l2_index,
                                                  QCow2ClusterType wanted_type)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
 
     assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
            wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = be64_to_cpu(l2_slice[i]);
+        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
         QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
         if (type != wanted_type) {
@@ -566,7 +569,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
-    *cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    *cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -601,14 +604,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
         c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  &l2_slice[l2_index], type);
+                                                  l2_slice, l2_index, type);
         *cluster_offset = 0;
         break;
     case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
         /* how many allocated clusters ? */
         c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_ZERO);
+                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
         *cluster_offset &= L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *cluster_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -761,7 +764,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     /* Compression can't overwrite anything. Fail if the cluster was already
      * allocated. */
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
     if (cluster_offset & L2E_OFFSET_MASK) {
         qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
         return -EIO;
@@ -786,7 +789,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-    l2_slice[l2_index] = cpu_to_be64(cluster_offset);
+    set_l2_entry(s, l2_slice, l2_index, cluster_offset);
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
@@ -978,12 +981,12 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
          * cluster the second one has to do RMW (which is done above by
          * perform_cow()), update l2 table with its cluster pointer and free
          * old cluster. This is what this loop does */
-        if (l2_slice[l2_index + i] != 0) {
-            old_cluster[j++] = l2_slice[l2_index + i];
+        if (get_l2_entry(s, l2_slice, l2_index + i) != 0) {
+            old_cluster[j++] = get_l2_entry(s, l2_slice, l2_index + i);
         }
 
-        l2_slice[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+        set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
+                     (cluster_offset + (i << s->cluster_bits)));
      }
 
 
@@ -997,8 +1000,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      */
     if (!m->keep_old_clusters && j != 0) {
         for (i = 0; i < j; i++) {
-            qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1,
-                                    QCOW2_DISCARD_NEVER);
+            qcow2_free_any_clusters(bs, old_cluster[i], 1, QCOW2_DISCARD_NEVER);
         }
     }
 
@@ -1051,7 +1053,7 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     if (keep_old) {
         int i;
         for (i = 0; i < nb_clusters; i++) {
-            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
             if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
                 break;
             }
@@ -1062,7 +1064,7 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     }
 
     /* Get the L2 entry from the first cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1072,7 +1074,7 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     }
 
     /* Get the L2 entry from the last cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1139,12 +1141,12 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
                               uint64_t *l2_slice, int l2_index, bool want_cow)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         if (cluster_needs_cow(bs, l2_entry) != want_cow) {
             break;
         }
@@ -1277,7 +1279,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
     if (!cluster_needs_cow(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
@@ -1658,7 +1660,7 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     for (i = 0; i < nb_clusters; i++) {
         uint64_t old_l2_entry;
 
-        old_l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * If full_discard is false, make sure that a discarded area reads back
@@ -1698,9 +1700,9 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (!full_discard && s->qcow_version >= 3) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
-            l2_slice[l2_index + i] = cpu_to_be64(0);
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
         }
 
         /* Then decrease the refcount */
@@ -1780,7 +1782,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         uint64_t old_offset;
         QCow2ClusterType cluster_type;
 
-        old_offset = be64_to_cpu(l2_slice[l2_index + i]);
+        old_offset = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * Minimize L2 changes if the cluster already reads back as
@@ -1794,10 +1796,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            l2_slice[l2_index + i] |= cpu_to_be64(QCOW_OFLAG_ZERO);
+            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
+            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
         }
     }
 
@@ -1935,7 +1938,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
             }
 
             for (j = 0; j < s->l2_slice_size; j++) {
-                uint64_t l2_entry = be64_to_cpu(l2_slice[j]);
+                uint64_t l2_entry = get_l2_entry(s, l2_slice, j);
                 int64_t offset = l2_entry & L2E_OFFSET_MASK;
                 QCow2ClusterType cluster_type =
                     qcow2_get_cluster_type(bs, l2_entry);
@@ -1949,7 +1952,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                     if (!bs->backing) {
                         /* not backed; therefore we can simply deallocate the
                          * cluster */
-                        l2_slice[j] = 0;
+                        set_l2_entry(s, l2_slice, j, 0);
                         l2_dirty = true;
                         continue;
                     }
@@ -2012,9 +2015,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 }
 
                 if (l2_refcount == 1) {
-                    l2_slice[j] = cpu_to_be64(offset | QCOW_OFLAG_COPIED);
+                    set_l2_entry(s, l2_slice, j, offset | QCOW_OFLAG_COPIED);
                 } else {
-                    l2_slice[j] = cpu_to_be64(offset);
+                    set_l2_entry(s, l2_slice, j, offset);
                 }
                 l2_dirty = true;
             }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0d64bf5a5e..84fe02d388 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1309,7 +1309,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     uint64_t cluster_index;
                     uint64_t offset;
 
-                    entry = be64_to_cpu(l2_slice[j]);
+                    entry = get_l2_entry(s, l2_slice, j);
                     old_entry = entry;
                     entry &= ~QCOW_OFLAG_COPIED;
                     offset = entry & L2E_OFFSET_MASK;
@@ -1383,7 +1383,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                             qcow2_cache_set_dependency(bs, s->l2_table_cache,
                                                        s->refcount_block_cache);
                         }
-                        l2_slice[j] = cpu_to_be64(entry);
+                        set_l2_entry(s, l2_slice, j, entry);
                         qcow2_cache_entry_mark_dirty(s->l2_table_cache,
                                                      l2_slice);
                     }
@@ -1616,7 +1616,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
 
     /* Do the actual checks */
     for(i = 0; i < s->l2_size; i++) {
-        l2_entry = be64_to_cpu(l2_table[i]);
+        l2_entry = get_l2_entry(s, l2_table, i);
 
         switch (qcow2_get_cluster_type(bs, l2_entry)) {
         case QCOW2_CLUSTER_COMPRESSED:
@@ -1685,7 +1685,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
-                        l2_table[i] = cpu_to_be64(l2_entry);
+                        set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, sizeof(uint64_t), false);
                         if (ret < 0) {
@@ -1913,7 +1913,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         for (j = 0; j < s->l2_size; j++) {
-            uint64_t l2_entry = be64_to_cpu(l2_table[j]);
+            uint64_t l2_entry = get_l2_entry(s, l2_table, j);
             uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
             QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
 
@@ -1936,9 +1936,10 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                             "l2_entry=%" PRIx64 " refcount=%" PRIu64 "\n",
                             repair ? "Repairing" : "ERROR", l2_entry, refcount);
                     if (repair) {
-                        l2_table[j] = cpu_to_be64(refcount == 1
-                                    ? l2_entry |  QCOW_OFLAG_COPIED
-                                    : l2_entry & ~QCOW_OFLAG_COPIED);
+                        set_l2_entry(s, l2_table, j,
+                                     refcount == 1 ?
+                                     l2_entry |  QCOW_OFLAG_COPIED :
+                                     l2_entry & ~QCOW_OFLAG_COPIED);
                         l2_dirty++;
                     }
                 }
diff --git a/block/qcow2.h b/block/qcow2.h
index 5cccd87162..940cd4c236 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -488,6 +488,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                    int idx)
+{
+    return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                int idx, uint64_t entry)
+{
+    l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (3 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 04/26] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-30 16:23   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 06/26] qcow2: Add dummy has_subclusters() function Alberto Garcia
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
 docs/qcow2-cache.txt   | 19 +++++++++++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..d34261f955 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
                     as the maximum cluster size and won't be able to open images
                     with larger cluster sizes.
 
+                    Note: if the image has Extended L2 Entries then cluster_bits
+                    must be at least 14 (i.e. 16384 byte clusters).
+
          24 - 31:   size
                     Virtual disk size in bytes.
 
@@ -109,7 +112,12 @@ in the description of a field.
                                 An External Data File Name header extension may
                                 be present if this bit is set.
 
-                    Bits 3-63:  Reserved (set to 0)
+                    Bit 3:      Extended L2 Entries.  If this bit is set then
+                                L2 table entries use an extended format that
+                                allows subcluster-based allocation. See the
+                                Extended L2 Entries section for more details.
+
+                    Bits 4-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-    l2_entries = (cluster_size / sizeof(uint64_t))
+    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
 
     l2_index = (offset / cluster_size) % l2_entries
     l1_index = (offset / cluster_size) / l2_entries
@@ -447,6 +455,8 @@ obtained as follows:
 
     return cluster_offset + (offset % cluster_size)
 
+    [*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
     Bit  0 -  8:    Reserved (set to 0)
@@ -487,7 +497,8 @@ Standard Cluster Descriptor:
                     nor is data read from the backing file if the cluster is
                     unallocated.
 
-                    With version 2, this is always 0.
+                    With version 2 or with extended L2 entries (see the next
+                    section), this is always 0.
 
          1 -  8:    Reserved (set to 0)
 
@@ -524,6 +535,57 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated like in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+    Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+                    1: the subcluster is allocated. In this case the
+                       host cluster offset field must contain a valid
+                       offset.
+                    0: the subcluster is not allocated. In this case
+                       read requests shall go to the backing file or
+                       return zeros if there is no backing file data.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 31 - x)
+
+        32 -  63    Subcluster reads as zeros (one bit per subcluster)
+
+                    1: the subcluster reads as zeros. In this case the
+                       allocation status bit must be unset. The host
+                       cluster offset field may or may not be set.
+                    0: no effect.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 63 - x)
+
+Subcluster Allocation Bitmap (for compressed clusters):
+
+    Bit  0 -  63:   Reserved (set to 0)
+                    Compressed clusters don't have subclusters,
+                    so this field is not used.
 
 == Snapshots ==
 
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index d57f409861..04eb4ce2f1 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -1,6 +1,6 @@
 qcow2 L2/refcount cache configuration
 =====================================
-Copyright (C) 2015, 2018 Igalia, S.L.
+Copyright (C) 2015, 2018-2019 Igalia, S.L.
 Author: Alberto Garcia <berto@igalia.com>
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -222,3 +222,20 @@ support this functionality, and is 0 (disabled) on other platforms.
 This functionality currently relies on the MADV_DONTNEED argument for
 madvise() to actually free the memory. This is a Linux-specific feature,
 so cache-clean-interval is not supported on other systems.
+
+
+Extended L2 Entries
+-------------------
+All numbers shown in this document are valid for qcow2 images with normal
+64-bit L2 entries.
+
+Images with extended L2 entries need twice as much L2 metadata, so the L2
+cache size must be twice as large for the same disk space.
+
+   disk_size = l2_cache_size * cluster_size / 16
+
+i.e.
+
+   l2_cache_size = disk_size * 16 / cluster_size
+
+Refcount blocks are not affected by this.
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 06/26] qcow2: Add dummy has_subclusters() function
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (4 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 07/26] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 940cd4c236..b3826b37c1 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -488,6 +488,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+    /* FIXME: Return false until this feature is complete */
+    return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 07/26] qcow2: Add subcluster-related fields to BDRVQcow2State
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (5 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 06/26] qcow2: Add dummy has_subclusters() function Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 08/26] qcow2: Add offset_to_sc_index() Alberto Garcia
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one,
with subcluster_size = cluster_size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 5 +++++
 block/qcow2.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0bc69e6996..ed8b81c7b7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1342,6 +1342,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         }
     }
 
+    s->subclusters_per_cluster =
+        has_subclusters(s) ? QCOW_MAX_SUBCLUSTERS_PER_CLUSTER : 1;
+    s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+    s->subcluster_bits = ctz32(s->subcluster_size);
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
diff --git a/block/qcow2.h b/block/qcow2.h
index b3826b37c1..278ca41314 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -75,6 +75,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -277,6 +279,9 @@ typedef struct BDRVQcow2State {
     int cluster_bits;
     int cluster_size;
     int l2_slice_size;
+    int subcluster_bits;
+    int subcluster_size;
+    int subclusters_per_cluster;
     int l2_bits;
     int l2_size;
     int l1_size;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 08/26] qcow2: Add offset_to_sc_index()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (6 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 07/26] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 09/26] qcow2: Add l2_entry_size() Alberto Garcia
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 278ca41314..e25758079c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -559,6 +559,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State *s, int64_t offset)
     return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+    return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
     return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 09/26] qcow2: Add l2_entry_size()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (7 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 08/26] qcow2: Add offset_to_sc_index() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-30 16:47   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c  | 12 ++++++------
 block/qcow2-refcount.c | 14 ++++++++------
 block/qcow2.c          |  6 +++---
 block/qcow2.h          |  5 +++++
 4 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 581fa90ab1..1f509bda15 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -209,7 +209,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
                    uint64_t l2_offset, uint64_t **l2_slice)
 {
     BDRVQcow2State *s = bs->opaque;
-    int start_of_slice = sizeof(uint64_t) *
+    int start_of_slice = l2_entry_size(s) *
         (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
     return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -277,7 +277,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new l2 entry */
 
-    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
     if (l2_offset < 0) {
         ret = l2_offset;
         goto fail;
@@ -301,7 +301,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new entry in the l2 cache */
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -365,7 +365,7 @@ fail:
     }
     s->l1_table[l1_index] = old_l2_offset;
     if (l2_offset > 0) {
-        qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+        qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                             QCOW2_DISCARD_ALWAYS);
     }
     return ret;
@@ -708,7 +708,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
 
         /* Then decrease the refcount of the old table */
         if (l2_offset) {
-            qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+            qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                                 QCOW2_DISCARD_OTHER);
         }
 
@@ -1883,7 +1883,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
     int ret;
     int i, j;
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 84fe02d388..621318447d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1253,7 +1253,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     l2_slice = NULL;
     l1_table = NULL;
     l1_size2 = l1_size * sizeof(uint64_t);
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     s->cache_discards = true;
@@ -1604,7 +1604,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
     int i, l2_size, nb_csectors, ret;
 
     /* Read L2 table from disk */
-    l2_size = s->l2_size * sizeof(uint64_t);
+    l2_size = s->l2_size * l2_entry_size(s);
     l2_table = g_malloc(l2_size);
 
     ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1679,15 +1679,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                             fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
                             offset);
                     if (fix & BDRV_FIX_ERRORS) {
+                        int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
                         uint64_t l2e_offset =
-                            l2_offset + (uint64_t)i * sizeof(uint64_t);
+                            l2_offset + (uint64_t)i * l2_entry_size(s);
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
                         set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
-                                l2e_offset, sizeof(uint64_t), false);
+                                l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Overlap check failed\n");
                             res->check_errors++;
@@ -1697,7 +1698,8 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         }
 
                         ret = bdrv_pwrite_sync(bs->file, l2e_offset,
-                                               &l2_table[i], sizeof(uint64_t));
+                                               &l2_table[idx],
+                                               l2_entry_size(s));
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Failed to overwrite L2 "
                                     "table entry: %s\n", strerror(-ret));
@@ -1904,7 +1906,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         ret = bdrv_pread(bs->file, l2_offset, l2_table,
-                         s->l2_size * sizeof(uint64_t));
+                         s->l2_size * l2_entry_size(s));
         if (ret < 0) {
             fprintf(stderr, "ERROR: Could not read L2 table: %s\n",
                     strerror(-ret));
diff --git a/block/qcow2.c b/block/qcow2.c
index ed8b81c7b7..ab40ae36ea 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -832,7 +832,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
     /* An L2 table is always one cluster in size so the max cache size
      * should be a multiple of the cluster size. */
-    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
+    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * l2_entry_size(s),
                                      s->cluster_size);
 
     combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
@@ -993,7 +993,7 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
         }
     }
 
-    r->l2_slice_size = l2_cache_entry_size / sizeof(uint64_t);
+    r->l2_slice_size = l2_cache_entry_size / l2_entry_size(s);
     r->l2_table_cache = qcow2_cache_create(bs, l2_cache_size,
                                            l2_cache_entry_size);
     r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size,
@@ -1387,7 +1387,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         bs->encrypted = true;
     }
 
-    s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    s->l2_bits = s->cluster_bits - ctz32(l2_entry_size(s));
     s->l2_size = 1 << s->l2_bits;
     /* 2^(s->refcount_order - 3) is the refcount width in bytes */
     s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
diff --git a/block/qcow2.h b/block/qcow2.h
index e25758079c..29a253bfb9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -499,6 +499,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
     return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+    return has_subclusters(s) ? sizeof(uint64_t) * 2 : sizeof(uint64_t);
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (8 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 09/26] qcow2: Add l2_entry_size() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-30 16:55   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type() Alberto Garcia
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are used
to access the bitmaps. For convenience, these functions are no-ops
when used in traditional qcow2 images.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 29a253bfb9..741c41c80f 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -507,15 +507,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                     int idx)
+{
+    if (has_subclusters(s)) {
+        idx *= l2_entry_size(s) / sizeof(uint64_t);
+        return be64_to_cpu(l2_slice[idx + 1]);
+    } else {
+        return 0;
+    }
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                 int idx, uint64_t entry)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                 int idx, uint64_t bitmap)
+{
+    if (has_subclusters(s)) {
+        idx *= l2_entry_size(s) / sizeof(uint64_t);
+        l2_slice[idx + 1] = cpu_to_be64(bitmap);
+    }
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (9 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-04 12:35   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER Alberto Garcia
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This function returns the type of an individual subcluster. If an
image does not have subclusters then this returns the exact same value
as qcow2_get_cluster_type().

The information in standard and extended L2 entries is encoded in a
slightly different way, but all existing QCow2ClusterType values are
also valid for subclusters and have the same meanings (although they
typically only apply to the requested subcluster).

There are two important exceptions to this:

  a) QCOW2_CLUSTER_COMPRESSED means that the whole cluster is
     compressed. We do not support compression at the subcluster
     level.

  b) QCOW2_CLUSTER_UNALLOCATED means that the cluster is unallocated,
     that is, the offset field of the L2 entry does not point to a
     host cluster. All subclusters are obviously unallocated too but
     any of them could be of type QCOW2_CLUSTER_ZERO_PLAIN.

In addition to that, extended L2 entries allow one new scenario where
the cluster is normally allocated but an individual subcluster is not.
This is very different from (b) and because of that this patch adds a
new value called QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER.

As a last thing, this patch adds QCOW2_CLUSTER_INVALID to detect the
cases where an L2 entry has a value that violates the spec. The caller
is responsible for handling these situations.

To prevent compatibility problems with images that have invalid values
but are currently being read by QEMU without causing side effects,
QCOW2_CLUSTER_INVALID is only returned for images with extended L2
entries.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 741c41c80f..23a2532ff2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -77,6 +77,15 @@
 
 #define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)    ((1ULL << 63) >> (X))
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   ((1ULL << 31) >> (X))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES 0xFFFFFFFF00000000ULL
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  0x00000000FFFFFFFFULL
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -438,10 +447,12 @@ typedef struct QCowL2Meta
 
 typedef enum QCow2ClusterType {
     QCOW2_CLUSTER_UNALLOCATED,
+    QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER,
     QCOW2_CLUSTER_ZERO_PLAIN,
     QCOW2_CLUSTER_ZERO_ALLOC,
     QCOW2_CLUSTER_NORMAL,
     QCOW2_CLUSTER_COMPRESSED,
+    QCOW2_CLUSTER_INVALID,
 } QCow2ClusterType;
 
 typedef enum QCow2MetadataOverlap {
@@ -621,6 +632,57 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs,
     }
 }
 
+/* In an image without subsclusters this returns the same value as
+ * qcow2_get_cluster_type() */
+static inline int qcow2_get_subcluster_type(BlockDriverState *bs,
+                                            uint64_t l2_entry,
+                                            uint64_t l2_bitmap,
+                                            unsigned sc_index)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
+    assert(sc_index < s->subclusters_per_cluster);
+
+    if (has_subclusters(s)) {
+        bool sc_zero  = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
+        bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
+        switch (type) {
+        case QCOW2_CLUSTER_COMPRESSED:
+            if (l2_bitmap != 0) {
+                return QCOW2_CLUSTER_INVALID;
+            }
+            break;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+            return QCOW2_CLUSTER_INVALID;
+        case QCOW2_CLUSTER_NORMAL:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_CLUSTER_NORMAL;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_ZERO_ALLOC;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_CLUSTER_INVALID;
+            }
+        case QCOW2_CLUSTER_UNALLOCATED:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_UNALLOCATED;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_CLUSTER_INVALID;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_ZERO_PLAIN;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_CLUSTER_INVALID;
+            }
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    return type;
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (10 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-04 12:57   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
                   ` (14 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

In the previous patch we added a new QCow2ClusterType named
QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places
where this new value needs to be handled, and that is what this patch
does.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ab40ae36ea..0261e87709 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1938,8 +1938,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
 
     *pnum = bytes;
 
-    if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
-        !s->crypto) {
+    if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC ||
+         ret == QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) && !s->crypto) {
         index_in_cluster = offset & (s->cluster_size - 1);
         *map = cluster_offset | index_in_cluster;
         *file = s->data_file->bs;
@@ -1947,7 +1947,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     }
     if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
-    } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+    } else if (ret != QCOW2_CLUSTER_UNALLOCATED &&
+               ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) {
         status |= BDRV_BLOCK_DATA;
     }
     if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2117,6 +2118,7 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
         g_assert_not_reached();
 
     case QCOW2_CLUSTER_UNALLOCATED:
+    case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
         assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
         BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2187,7 +2189,8 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
 
         if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
             ret == QCOW2_CLUSTER_ZERO_ALLOC ||
-            (ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
+            (ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing) ||
+            (ret == QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER && !bs->backing))
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
@@ -3701,6 +3704,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         nr = s->cluster_size;
         ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
         if (ret != QCOW2_CLUSTER_UNALLOCATED &&
+            ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER &&
             ret != QCOW2_CLUSTER_ZERO_PLAIN &&
             ret != QCOW2_CLUSTER_ZERO_ALLOC) {
             qemu_co_mutex_unlock(&s->lock);
@@ -3771,6 +3775,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
         switch (ret) {
         case QCOW2_CLUSTER_UNALLOCATED:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (11 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-04 14:21   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
                   ` (13 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

   - If the cluster is new, then subclusters #0 to #3 from the old
     cluster must be copied into the new one.

   - If the cluster is new but the old cluster was unallocated, then
     only subcluster #3 needs copy-on-write. #0 to #2 are marked as
     unallocated in the bitmap of the new L2 entry.

   - If we are overwriting an old cluster and subcluster #3 is
     unallocated or has the all-zeroes bit set then we need
     copy-on-write on subcluster #3.

   - If we are overwriting an old cluster and subcluster #3 was
     allocated then there is no need to copy-on-write.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 136 +++++++++++++++++++++++++++++++++---------
 1 file changed, 108 insertions(+), 28 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 1f509bda15..990bc070af 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1034,14 +1034,16 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 1 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
-                              uint64_t guest_offset, uint64_t bytes,
-                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
+                             uint64_t guest_offset, uint64_t bytes,
+                             uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    int l2_index = offset_to_l2_slice_index(s, guest_offset);
-    uint64_t l2_entry;
+    int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry, l2_bitmap;
     unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
@@ -1049,38 +1051,108 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     QCowL2Meta *old_m = *m;
     QCow2ClusterType type;
 
-    /* Return if there's no COW (all clusters are normal and we keep them) */
+    /* Return if there's no COW (all subclusters are normal and we are
+     * keeping the clusters) */
     if (keep_old) {
+        unsigned first_sc = cow_start_to / s->subcluster_size;
+        unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
         int i;
-        for (i = 0; i < nb_clusters; i++) {
-            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+        for (i = first_sc; i <= last_sc; i++) {
+            unsigned c = i / s->subclusters_per_cluster;
+            unsigned sc = i % s->subclusters_per_cluster;
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+            l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+            type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+            if (type == QCOW2_CLUSTER_INVALID) {
+                l2_index += c; /* Point to the invalid entry */
+                goto fail;
+            }
+            if (type != QCOW2_CLUSTER_NORMAL) {
                 break;
             }
         }
-        if (i == nb_clusters) {
-            return;
+        if (i == last_sc + 1) {
+            return 1;
         }
     }
 
     /* Get the L2 entry from the first cluster */
     l2_entry = get_l2_entry(s, l2_slice, l2_index);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_start_from = cow_start_to;
+    if (type == QCOW2_CLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+        case QCOW2_CLUSTER_COMPRESSED:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_start_from = 0;
+            break;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_UNALLOCATED:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_start_from = 0;
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+            cow_start_from = cow_start_to;
+            break;
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     /* Get the L2 entry from the last cluster */
-    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_index += nb_clusters - 1;
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset + bytes - 1);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_end_to = cow_end_from;
+    if (type == QCOW2_CLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+        case QCOW2_CLUSTER_COMPRESSED:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+            break;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_UNALLOCATED:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+            cow_end_to = cow_end_from;
+            break;
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     *m = g_malloc0(sizeof(**m));
@@ -1105,6 +1177,18 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
 
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+
+fail:
+    if (type == QCOW2_CLUSTER_INVALID) {
+        uint64_t l1_index = offset_to_l1_index(s, guest_offset);
+        uint64_t l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        return -EIO;
+    }
+
+    return 1;
 }
 
 /* Returns true if the cluster is unallocated or has refcount > 1 */
@@ -1313,10 +1397,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
                  - offset_into_cluster(s, guest_offset));
         assert(*bytes != 0);
 
-        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
-                          *bytes, l2_slice, m, true);
-
-        ret = 1;
+        ret = calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK,
+                                guest_offset, *bytes, l2_slice, m, true);
     } else {
         ret = 0;
     }
@@ -1491,10 +1573,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
-                      m, false);
-
-    ret = 1;
+    ret = calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                            l2_slice, m, false);
 
 out:
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (12 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-04 14:58   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 111 ++++++++++++++++++++----------------------
 1 file changed, 52 insertions(+), 59 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 990bc070af..e67559152f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -372,66 +372,51 @@ fail:
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. At most @nb_clusters are checked. Allocated clusters are
+ * also required to be contiguous in the image file.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+                                        unsigned sc_index, uint64_t *l2_slice,
+                                        int l2_index)
 {
     BDRVQcow2State *s = bs->opaque;
-    int i;
-    QCow2ClusterType first_cluster_type;
-    uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-    uint64_t offset = first_entry & mask;
-
-    first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-    if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-        return 0;
+    int i, j, count = 0;
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
+    bool check_offset = true;
+    QCow2ClusterType type =
+        qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+
+    assert(type != QCOW2_CLUSTER_INVALID); /* The caller should check this */
+
+    if (type == QCOW2_CLUSTER_COMPRESSED) {
+        return 1; /* Compressed clusters are always counted one by one */
     }
 
-    /* must be allocated */
-    assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-           first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
-
-    for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-        if (offset + (uint64_t) i * cluster_size != l2_entry) {
-            break;
-        }
+    if (type == QCOW2_CLUSTER_UNALLOCATED || type == QCOW2_CLUSTER_ZERO_PLAIN) {
+        check_offset = false;
     }
 
-        return i;
-}
-
-/*
- * Checks how many consecutive unallocated clusters in a given L2
- * slice have the same cluster type.
- */
-static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
-                                                 int nb_clusters,
-                                                 uint64_t *l2_slice,
-                                                 int l2_index,
-                                                 QCow2ClusterType wanted_type)
-{
-    BDRVQcow2State *s = bs->opaque;
-    int i;
-
-    assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-           wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-        QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
-
-        if (type != wanted_type) {
-            break;
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+        l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+        if (check_offset && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+            goto out;
+        }
+        for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) {
+            if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) {
+                goto out;
+            }
+            count++;
         }
+        expected_offset += s->cluster_size;
     }
 
-    return i;
+out:
+    return count;
 }
 
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
@@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
                              unsigned int *bytes, uint64_t *cluster_offset)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned int l2_index;
-    uint64_t l1_index, l2_offset, *l2_slice;
+    unsigned int l2_index, sc_index;
+    uint64_t l1_index, l2_offset, *l2_slice, l2_bitmap;
     int c;
     unsigned int offset_in_cluster;
     uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -569,7 +554,9 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
+    sc_index = offset_to_sc_index(s, offset);
     *cluster_offset = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -577,7 +564,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
      * true */
     assert(nb_clusters <= INT_MAX);
 
-    type = qcow2_get_cluster_type(bs, *cluster_offset);
+    type = qcow2_get_subcluster_type(bs, *cluster_offset, l2_bitmap, sc_index);
     if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
                                 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
         qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -587,6 +574,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         goto fail;
     }
     switch (type) {
+    case QCOW2_CLUSTER_INVALID:
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        ret = -EIO;
+        goto fail;
+        break;
     case QCOW2_CLUSTER_COMPRESSED:
         if (has_data_file(bs)) {
             qcow2_signal_corruption(bs, true, -1, -1, "Compressed cluster "
@@ -602,16 +596,15 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         break;
     case QCOW2_CLUSTER_ZERO_PLAIN:
     case QCOW2_CLUSTER_UNALLOCATED:
-        /* how many empty clusters ? */
-        c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  l2_slice, l2_index, type);
+        c = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                         l2_slice, l2_index);
         *cluster_offset = 0;
         break;
     case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
-        /* how many allocated clusters ? */
-        c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
+    case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+        c = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                         l2_slice, l2_index);
         *cluster_offset &= L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *cluster_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -640,7 +633,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
-    bytes_available = (int64_t)c * s->cluster_size;
+    bytes_available = ((int64_t)c + sc_index) * s->subcluster_size;
 
 out:
     if (bytes_available > bytes_needed) {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (13 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-04 15:04   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
                   ` (11 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e67559152f..3e4ba8d448 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1852,7 +1852,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     assert(nb_clusters <= INT_MAX);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t old_offset;
+        uint64_t old_offset, l2_entry = 0;
         QCow2ClusterType cluster_type;
 
         old_offset = get_l2_entry(s, l2_slice, l2_index + i);
@@ -1869,12 +1869,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         }
+
+        if (has_subclusters(s)) {
+            set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES);
+        } else {
+            l2_entry |= QCOW_OFLAG_ZERO;
+        }
+
+        set_l2_entry(s, l2_slice, l2_index + i, l2_entry);
     }
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (14 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-04 15:07   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 17/26] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
                   ` (10 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3e4ba8d448..aa3eb727a5 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1772,7 +1772,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-        if (!full_discard && s->qcow_version >= 3) {
+        if (has_subclusters(s)) {
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
+            set_l2_bitmap(s, l2_slice, l2_index + i,
+                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
+        } else if (!full_discard && s->qcow_version >= 3) {
             set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
             set_l2_entry(s, l2_slice, l2_index + i, 0);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 17/26] qcow2: Add subcluster support to check_refcounts_l2()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (15 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-refcount.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 621318447d..bc73125f70 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1685,8 +1685,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
-                        l2_entry = QCOW_OFLAG_ZERO;
-                        set_l2_entry(s, l2_table, i, l2_entry);
+                        if (has_subclusters(s)) {
+                            set_l2_entry(s, l2_table, i, 0);
+                            set_l2_bitmap(s, l2_table, i,
+                                          QCOW_L2_BITMAP_ALL_ZEROES);
+                        } else {
+                            set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+                        }
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (16 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 17/26] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-05 11:05   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 19/26] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Two changes are needed in order to add subcluster support to this
function: deallocated clusters must have their bitmaps cleared, and
expanded clusters must have all the "subcluster allocated" bits set.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index aa3eb727a5..62f2a9fcc0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2036,6 +2036,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                         /* not backed; therefore we can simply deallocate the
                          * cluster */
                         set_l2_entry(s, l2_slice, j, 0);
+                        set_l2_bitmap(s, l2_slice, j, 0);
                         l2_dirty = true;
                         continue;
                     }
@@ -2102,6 +2103,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 } else {
                     set_l2_entry(s, l2_slice, j, offset);
                 }
+                set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
                 l2_dirty = true;
             }
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 19/26] qcow2: Fix offset calculation in handle_dependencies()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (17 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

l2meta_cow_start() and l2meta_cow_end() are not necessarily
cluster-aligned if the image has subclusters, so update the
calculation of old_start and old_end to guarantee that no two requests
try to write on the same cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 62f2a9fcc0..fb6cf8df17 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1262,8 +1262,8 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 
         uint64_t start = guest_offset;
         uint64_t end = start + bytes;
-        uint64_t old_start = l2meta_cow_start(old_alloc);
-        uint64_t old_end = l2meta_cow_end(old_alloc);
+        uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
+        uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), s->cluster_size);
 
         if (end <= old_start || start >= old_end) {
             /* No intersection */
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (18 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 19/26] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-05 11:43   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 21/26] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
                   ` (6 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated.

This needs to happen even if the cluster was already allocated and the
L2 entry was otherwise valid.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fb6cf8df17..acb7226e03 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -980,6 +980,22 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 
         set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
                      (cluster_offset + (i << s->cluster_bits)));
+
+        /* Update bitmap with the subclusters that were just written */
+        if (has_subclusters(s)) {
+            uint64_t written_from = m->cow_start.offset;
+            uint64_t written_to = m->cow_end.offset + m->cow_end.nb_bytes;
+            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+            int sc;
+            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
+                uint64_t sc_off = i * s->cluster_size + sc * s->subcluster_size;
+                if (sc_off >= written_from && sc_off < written_to) {
+                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
+                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
+                }
+            }
+            set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+        }
      }
 
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 21/26] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (19 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 22/26] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index acb7226e03..3ba8a98073 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -783,6 +783,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
     set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+    set_l2_bitmap(s, l2_slice, l2_index, 0);
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 22/26] qcow2: Add subcluster support to handle_alloc_space()
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (20 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 21/26] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-05 12:05   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 23/26] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
                   ` (4 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0261e87709..01322ca449 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2304,6 +2304,9 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 
     for (m = l2meta; m != NULL; m = m->next) {
         int ret;
+        uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+        uint64_t nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+            m->cow_start.offset;
 
         if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
             continue;
@@ -2318,16 +2321,14 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
          * efficiently zero out the whole clusters
          */
 
-        ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-                                            m->nb_clusters * s->cluster_size,
+        ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
                                             true);
         if (ret < 0) {
             return ret;
         }
 
         BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-        ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-                                    m->nb_clusters * s->cluster_size,
+        ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
                                     BDRV_REQ_NO_FALLBACK);
         if (ret < 0) {
             if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 23/26] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (21 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 22/26] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-10-26 21:25 ` [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Ideally it should be possible to zero individual subclusters using
this function, but this is currently not implemented.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 01322ca449..537569ce88 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3704,6 +3704,12 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         bytes = s->cluster_size;
         nr = s->cluster_size;
         ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
+        /* TODO: allow zeroing separate subclusters, we only allow
+         * zeroing full clusters at the moment. */
+        if (nr != bytes) {
+            qemu_co_mutex_unlock(&s->lock);
+            return -ENOTSUP;
+        }
         if (ret != QCOW2_CLUSTER_UNALLOCATED &&
             ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER &&
             ret != QCOW2_CLUSTER_ZERO_PLAIN &&
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (22 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 23/26] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-05 12:47   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 25/26] qcow2: Allow preallocation and backing files if extended_l2 is set Alberto Garcia
                   ` (2 subsequent siblings)
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c                    |  46 ++++++++++++++
 block/qcow2.h                    |   8 ++-
 include/block/block_int.h        |   1 +
 qapi/block-core.json             |   6 ++
 tests/qemu-iotests/031.out       |   8 +--
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 +++++++++++++++----------------
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  20 +++---
 tests/qemu-iotests/065           |  18 ++++--
 tests/qemu-iotests/082.out       |  48 ++++++++++++---
 tests/qemu-iotests/085.out       |  38 ++++++------
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +--
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   4 ++
 tests/qemu-iotests/242.out       |   5 ++
 tests/qemu-iotests/255.out       |   8 +--
 tests/qemu-iotests/common.filter |   1 +
 20 files changed, 224 insertions(+), 110 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 537569ce88..b1fa7ab5da 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1347,6 +1347,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
     s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
     s->subcluster_bits = ctz32(s->subcluster_size);
 
+    if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
+        error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size);
+        ret = -EINVAL;
+        goto fail;
+    }
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
@@ -2806,6 +2812,11 @@ int qcow2_update_header(BlockDriverState *bs)
                 .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
                 .name = "lazy refcounts",
             },
+            {
+                .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+                .bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
+                .name = "extended L2 entries",
+            },
         };
 
         ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
@@ -3271,6 +3282,27 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         goto out;
     }
 
+    if (!qcow2_opts->has_extended_l2) {
+        qcow2_opts->extended_l2 = false;
+    }
+    if (qcow2_opts->extended_l2) {
+        unsigned min_cluster_size =
+            (1 << MIN_CLUSTER_BITS) * QCOW_MAX_SUBCLUSTERS_PER_CLUSTER;
+        if (version < 3) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "compatibility level 1.1 and above (use version=v3 or "
+                       "greater)");
+            ret = -EINVAL;
+            goto out;
+        }
+        if (cluster_size < min_cluster_size) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "cluster sizes of at least %u bytes", min_cluster_size);
+            ret = -EINVAL;
+            goto out;
+        }
+    }
+
     if (!qcow2_opts->has_preallocation) {
         qcow2_opts->preallocation = PREALLOC_MODE_OFF;
     }
@@ -3392,6 +3424,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
             cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
     }
 
+    if (qcow2_opts->extended_l2) {
+        header->incompatible_features |=
+            cpu_to_be64(QCOW2_INCOMPAT_EXTL2);
+    }
+
     ret = blk_pwrite(blk, 0, header, cluster_size, 0);
     g_free(header);
     if (ret < 0) {
@@ -3569,6 +3606,7 @@ static int coroutine_fn qcow2_co_create_opts(const char *filename, QemuOpts *opt
         { BLOCK_OPT_BACKING_FMT,        "backing-fmt" },
         { BLOCK_OPT_CLUSTER_SIZE,       "cluster-size" },
         { BLOCK_OPT_LAZY_REFCOUNTS,     "lazy-refcounts" },
+        { BLOCK_OPT_EXTL2,              "extended-l2" },
         { BLOCK_OPT_REFCOUNT_BITS,      "refcount-bits" },
         { BLOCK_OPT_ENCRYPT,            BLOCK_OPT_ENCRYPT_FORMAT },
         { BLOCK_OPT_COMPAT_LEVEL,       "version" },
@@ -4772,6 +4810,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
             .corrupt            = s->incompatible_features &
                                   QCOW2_INCOMPAT_CORRUPT,
             .has_corrupt        = true,
+            .has_extended_l2    = true,
+            .extended_l2        = has_subclusters(s),
             .refcount_bits      = s->refcount_bits,
             .has_bitmaps        = !!bitmaps,
             .bitmaps            = bitmaps,
@@ -5365,6 +5405,12 @@ static QemuOptsList qcow2_create_opts = {
             .help = "Postpone refcount updates",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_EXTL2,
+            .type = QEMU_OPT_BOOL,
+            .help = "Extended L2 tables",
+            .def_value_str = "off"
+        },
         {
             .name = BLOCK_OPT_REFCOUNT_BITS,
             .type = QEMU_OPT_NUMBER,
diff --git a/block/qcow2.h b/block/qcow2.h
index 23a2532ff2..3832e8ccc9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -220,13 +220,16 @@ enum {
     QCOW2_INCOMPAT_DIRTY_BITNR      = 0,
     QCOW2_INCOMPAT_CORRUPT_BITNR    = 1,
     QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+    QCOW2_INCOMPAT_EXTL2_BITNR      = 3,
     QCOW2_INCOMPAT_DIRTY            = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
     QCOW2_INCOMPAT_CORRUPT          = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
     QCOW2_INCOMPAT_DATA_FILE        = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
+    QCOW2_INCOMPAT_EXTL2            = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
     QCOW2_INCOMPAT_MASK             = QCOW2_INCOMPAT_DIRTY
                                     | QCOW2_INCOMPAT_CORRUPT
-                                    | QCOW2_INCOMPAT_DATA_FILE,
+                                    | QCOW2_INCOMPAT_DATA_FILE
+                                    | QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -506,8 +509,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-    /* FIXME: Return false until this feature is complete */
-    return false;
+    return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ca4ccac4c1..bf11764155 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,7 @@
 #define BLOCK_OPT_REFCOUNT_BITS     "refcount_bits"
 #define BLOCK_OPT_DATA_FILE         "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW     "data_file_raw"
+#define BLOCK_OPT_EXTL2             "extended_l2"
 
 #define BLOCK_PROBE_BUF_SIZE        512
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index aa97ee2641..6e4c81ae1c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -66,6 +66,8 @@
 #                 standalone (read-only) raw image without looking at qcow2
 #                 metadata (since: 4.0)
 #
+# @extended-l2: true if the image has extended L2 entries (since 4.2)
+#
 # @lazy-refcounts: on or off; only valid for compat >= 1.1
 #
 # @corrupt: true if the image has been marked corrupt; only valid for
@@ -85,6 +87,7 @@
       'compat': 'str',
       '*data-file': 'str',
       '*data-file-raw': 'bool',
+      '*extended-l2': 'bool',
       '*lazy-refcounts': 'bool',
       '*corrupt': 'bool',
       'refcount-bits': 'int',
@@ -4372,6 +4375,8 @@
 # @data-file-raw    True if the external data file must stay valid as a
 #                   standalone (read-only) raw image without looking at qcow2
 #                   metadata (default: false; since: 4.0)
+# @extended-l2      True if the image has extended L2 entries
+#                   (default: false; since 4.2)
 # @size             Size of the virtual disk in bytes
 # @version          Compatibility level (default: v3)
 # @backing-file     File name of the backing file if a backing file
@@ -4390,6 +4395,7 @@
   'data': { 'file':             'BlockdevRef',
             '*data-file':       'BlockdevRef',
             '*data-file-raw':   'bool',
+            '*extended-l2':     'bool',
             'size':             'size',
             '*version':         'BlockdevQcow2Version',
             '*backing-file':    'str',
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index 68a74d03b9..614950be56 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -117,7 +117,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -150,7 +150,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -164,7 +164,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   3
-backing_file_offset       0x178
+backing_file_offset       0x1a8
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -188,7 +188,7 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index e489b44386..c7e6512b43 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -58,7 +58,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 
@@ -86,7 +86,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 *** done
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 6b505408dd..ba4b42fd58 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 2. Specifying size via -o ==
 
 qemu-img create -f qcow2 -o size=1024 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 3. Invalid sizes ==
 
@@ -124,84 +124,84 @@ and exabytes, respectively.
 == Check correct interpretation of suffixes for cluster size ==
 
 qemu-img create -f qcow2 -o cluster_size=1024 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check compat level option ==
 
 qemu-img create -f qcow2 -o compat=0.10 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.42 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '0.42'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=foobar TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter 'foobar'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check preallocation option ==
 
 qemu-img create -f qcow2 -o preallocation=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=metadata TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=1234 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '1234'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check encryption option ==
 
 qemu-img create -f qcow2 -o encryption=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 --object secret,id=sec0,data=123456 -o encryption=on,encrypt.key-secret=sec0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check lazy_refcounts option (only with v3) ==
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=on TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Lazy refcounts only supported with compatibility level 1.1 and above (use version=v3 or greater)
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 *** done
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 0f6b0658a1..1f6ae50027 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -20,6 +20,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: true
+    extended l2: false
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot be opened read/write
 no file open, try 'help open'
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index d6a7c2af95..96a6933554 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -26,7 +26,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -84,7 +84,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -140,7 +140,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -195,7 +195,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -264,7 +264,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 65536/65536 bytes at offset 44040192
@@ -298,7 +298,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -327,7 +327,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 131072/131072 bytes at offset 0
@@ -496,6 +496,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 
 === Try changing the external data file ===
@@ -516,6 +517,7 @@ Format specific information:
     data file: foo
     data file raw: false
     corrupt: false
+    extended l2: false
 
 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': 'data-file' is required for this image
 image: TEST_DIR/t.IMGFMT
@@ -528,6 +530,7 @@ Format specific information:
     refcount bits: 16
     data file raw: false
     corrupt: false
+    extended l2: false
 
 === Clearing and setting data-file-raw ===
 
@@ -543,6 +546,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: true
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
@@ -555,6 +559,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 qemu-img: data-file-raw cannot be set on existing images
 image: TEST_DIR/t.IMGFMT
@@ -568,5 +573,6 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 5b21eb96bd..7d3a137434 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -95,17 +95,21 @@ class TestQCow3NotLazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
@@ -113,7 +117,8 @@ class TestQCow3NotLazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=off'
     qemu_options = 'lazy-refcounts=on'
     compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 
 class TestQCow3LazyQMP(TestQMP):
@@ -122,7 +127,8 @@ class TestQCow3LazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=on'
     qemu_options = 'lazy-refcounts=off'
     compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 TestImageInfoSpecific = None
 TestQemuImgInfo = None
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 9d4ed4dc9d..2a01e8bac2 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -3,14 +3,14 @@ QA output created by 082
 === create: Options specified more than once ===
 
 Testing: create -f foo -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
 cluster_size: 65536
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -20,9 +20,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -32,9 +33,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -59,6 +61,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -82,6 +85,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -105,6 +109,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -128,6 +133,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -151,6 +157,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -174,6 +181,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -197,6 +205,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -220,6 +229,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -227,10 +237,10 @@ Supported options:
   size=<size>            - Virtual disk size
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,? TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2, -o help TEST_DIR/t.qcow2 128M
 qemu-img: Invalid option list: backing_file=TEST_DIR/t.qcow2,
@@ -258,6 +268,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -279,7 +290,7 @@ qemu-img: Format driver 'bochs' does not support image creation
 === convert: Options specified more than once ===
 
 Testing: create -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: convert -f foo -f qcow2 TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -302,6 +313,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -313,6 +325,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -339,6 +352,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -362,6 +376,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -385,6 +400,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -408,6 +424,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -431,6 +448,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -454,6 +472,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -477,6 +496,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -500,6 +520,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -538,6 +559,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -582,6 +604,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=130M -o lazy_refcounts=off TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -593,6 +616,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=8M -o lazy_refcounts=on -o size=132M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -604,6 +628,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=4M,size=148M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -630,6 +655,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -654,6 +680,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -678,6 +705,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -702,6 +730,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -726,6 +755,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -750,6 +780,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -774,6 +805,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -798,6 +830,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -839,6 +872,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index 2a5f256cd3..62e7fd82a7 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -11,7 +11,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 
 === Create a single snapshot on virtio0 ===
 
-Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Invalid command - missing device and nodename ===
@@ -25,32 +25,32 @@ Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file
 
 === Create several transactional group snapshots ===
 
-Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Create a couple of snapshots using blockdev-snapshot ===
diff --git a/tests/qemu-iotests/144.out b/tests/qemu-iotests/144.out
index a9a8216bea..7bf05253cc 100644
--- a/tests/qemu-iotests/144.out
+++ b/tests/qemu-iotests/144.out
@@ -7,7 +7,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912
 === Performing Live Snapshot 1 ===
 
 {"return": {}}
-Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Performing block-commit on active layer ===
@@ -26,6 +26,6 @@ Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/
 
 === Performing Live Snapshot 2 ===
 
-Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 *** done
diff --git a/tests/qemu-iotests/182.out b/tests/qemu-iotests/182.out
index ffef23e32b..78e2d51761 100644
--- a/tests/qemu-iotests/182.out
+++ b/tests/qemu-iotests/182.out
@@ -10,7 +10,7 @@ Is another process using the image [TEST_DIR/t.qcow2]?
 
 {"return": {}}
 {"return": {}}
-Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 {"return": {}}
 {"return": {}}
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index ddfbf3c765..eaab057fed 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -7,12 +7,12 @@ Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
 
 === Creating backing chain ===
 
-Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 wrote 4194304/4194304 bytes at offset 0
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Start commit job and exit qemu ===
@@ -37,7 +37,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
 === Start mirror job and exit qemu ===
 
 {"return": {}}
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
@@ -48,7 +48,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 l
 === Start backup job and exit qemu ===
 
 {"return": {}}
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out
index e86b175e39..e46dccdb08 100644
--- a/tests/qemu-iotests/198.out
+++ b/tests/qemu-iotests/198.out
@@ -72,6 +72,7 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 
 == checking image layer ==
 image: json:{"encrypt.key-secret": "sec1", "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}}
@@ -115,4 +116,5 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 *** done
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index 61e7241e0b..d2efc0394a 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -21,6 +21,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (inline blockdev-add, explicit defaults) ===
 
@@ -43,6 +44,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v3 non-default options) ===
 
@@ -65,6 +67,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 1
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v2 non-default options) ===
 
@@ -141,6 +144,7 @@ Format specific information:
         payload offset: 528384
         master key iters: XXX
     corrupt: false
+    extended l2: false
 
 === Invalid BlockdevRef ===
 
diff --git a/tests/qemu-iotests/242.out b/tests/qemu-iotests/242.out
index 7ac8404d11..0d32dd9148 100644
--- a/tests/qemu-iotests/242.out
+++ b/tests/qemu-iotests/242.out
@@ -15,6 +15,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 No bitmap in JSON format output
 
@@ -40,6 +41,7 @@ Format specific information:
             granularity: 32768
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -77,6 +79,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -119,6 +122,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -162,5 +166,6 @@ Format specific information:
             granularity: 16384
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Test complete
diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
index 348909fdef..4e1b917a0f 100644
--- a/tests/qemu-iotests/255.out
+++ b/tests/qemu-iotests/255.out
@@ -3,9 +3,9 @@ Finishing a commit job with background reads
 
 === Create backing chain and start VM ===
 
-Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 === Start background read requests ===
 
@@ -23,9 +23,9 @@ Closing the VM while a job is being cancelled
 
 === Create images and start VM ===
 
-Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 wrote 1048576/1048576 bytes at offset 0
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 9f418b4881..45c0fe746b 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -137,6 +137,7 @@ _filter_img_create()
         -e "s# adapter_type=[^ ]*##g" \
         -e "s# hwversion=[^ ]*##g" \
         -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+        -e "s# extended_l2=\\(on\\|off\\)##g" \
         -e "s# block_size=[0-9]\\+##g" \
         -e "s# block_state_zero=\\(on\\|off\\)##g" \
         -e "s# log_size=[0-9]\\+##g" \
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 25/26] qcow2: Allow preallocation and backing files if extended_l2 is set
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (23 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-05 13:11   ` Max Reitz
  2019-10-26 21:25 ` [RFC PATCH v2 26/26] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
  2019-11-05 13:32 ` [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Max Reitz
  26 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Traditional qcow2 images don't allow preallocation if a backing file
is set. This is because once a cluster is allocated there is no way to
tell that its data should be read from the backing file.

Extended L2 entries have individual allocation bits for each
subcluster, and therefore it is perfectly possible to have an
allocated cluster with all its subclusters unallocated.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c              | 7 ++++---
 tests/qemu-iotests/206.out | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index b1fa7ab5da..8cf51c5d64 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3307,10 +3307,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         qcow2_opts->preallocation = PREALLOC_MODE_OFF;
     }
     if (qcow2_opts->has_backing_file &&
-        qcow2_opts->preallocation != PREALLOC_MODE_OFF)
+        qcow2_opts->preallocation != PREALLOC_MODE_OFF &&
+        !qcow2_opts->extended_l2)
     {
-        error_setg(errp, "Backing file and preallocation cannot be used at "
-                   "the same time");
+        error_setg(errp, "Backing file and preallocation can only be used at "
+                   "the same time if extended_l2 is on");
         ret = -EINVAL;
         goto out;
     }
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index d2efc0394a..cfddfbfaa4 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -198,7 +198,7 @@ Job failed: Different refcount widths than 16 bits require compatibility level 1
 === Invalid backing file options ===
 {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"backing-file": "/dev/null", "driver": "qcow2", "file": "node0", "preallocation": "full", "size": 67108864}}}
 {"return": {}}
-Job failed: Backing file and preallocation cannot be used at the same time
+Job failed: Backing file and preallocation can only be used at the same time if extended_l2 is on
 {"execute": "job-dismiss", "arguments": {"id": "job0"}}
 {"return": {}}
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [RFC PATCH v2 26/26] iotests: Add tests for qcow2 images with extended L2 entries
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (24 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 25/26] qcow2: Allow preallocation and backing files if extended_l2 is set Alberto Garcia
@ 2019-10-26 21:25 ` Alberto Garcia
  2019-11-05 13:32 ` [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Max Reitz
  26 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 tests/qemu-iotests/271     | 235 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/271.out | 183 +++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 419 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 0000000000..c49433cdc9
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,235 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019 Igalia, S.L.
+# Author: Alberto Garcia <berto@igalia.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=berto@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+        rm -f "$TEST_IMG.raw"
+        rm -f "$TEST_IMG.backing"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+
+IMGOPTS="extended_l2=on"
+l2_offset=262144 # 0x40000
+
+_verify_img()
+{
+    $QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are identical'
+    $QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+        grep -v 'No errors were found on the image'
+}
+
+_read_l2_entry()
+{
+    entry_no=$1
+    nentries=$2
+    offset=$(($l2_offset + $entry_no * 16))
+    length=$((nentries * 16))
+    $QEMU_IO -f raw -c "read -v $offset $length" "$TEST_IMG" | _filter_qemu_io | head -n -2
+}
+
+_test_write()
+{
+    cmd="$1"
+    l2_entry_idx="$2"
+    l2_entry_num="$3"
+    raw_cmd=`echo $1 | sed s/-c//` # Raw images don't support -c
+    echo "$cmd"
+    $QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+    $QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+    _verify_img
+    if [ -n "$l2_entry_idx" ]; then
+        _read_l2_entry "$l2_entry_idx" "$l2_entry_num"
+    fi
+}
+
+_reset_img()
+{
+    $QEMU_IMG create -f raw "$TEST_IMG.raw" 1M | _filter_img_create
+    if [ "$use_backing_file" = "yes" ]; then
+        $QEMU_IMG create -f raw "$TEST_IMG.backing" 1M | _filter_img_create
+        $QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.backing" | _filter_qemu_io
+        $QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.raw" | _filter_qemu_io
+        _make_test_img -b "$TEST_IMG.backing" 1M
+    else
+        _make_test_img 1M
+    fi
+}
+
+# Test that writing to an image with subclusters produces the expected
+# results, in images with and without backing files
+for use_backing_file in yes no; do
+    echo
+    echo "### Standard write tests (backing file: $use_backing_file) ###"
+    echo
+    _reset_img
+    ### Write subcluster #0 (beginning of subcluster) ###
+    _test_write 'write -q -P 1 0 1k' 0 1
+
+    ### Write subcluster #1 (middle of subcluster) ###
+    _test_write 'write -q -P 2 3k 512' 0 1
+
+    ### Write subcluster #2 (end of subcluster) ###
+    _test_write 'write -q -P 3 5k 1k' 0 1
+
+    ### Write subcluster #3 (full subcluster) ###
+    _test_write 'write -q -P 4 6k 2k' 0 1
+
+    ### Write subclusters #4-6 (full subclusters) ###
+    _test_write 'write -q -P 5 8k 6k' 0 1
+
+    ### Write subclusters #7-9 (partial subclusters) ###
+    _test_write 'write -q -P 6 15k 4k' 0 1
+
+    ### Write subcluster #16 (partial subcluster) ###
+    _test_write 'write -q -P 7 32k 1k' 0 2
+
+    ### Write subcluster #31-#34 (cluster overlap) ###
+    _test_write 'write -q -P 8 63k 4k' 0 2
+
+    ### Zero subcluster #1 (TODO: use the "all zeros" bit)
+    _test_write 'write -q -z 2k 2k' 0 1
+
+    ### Zero cluster #0
+    _test_write 'write -q -z 0 64k' 0 1
+
+    ### Fill cluster #0 with data
+    _test_write 'write -q -P 9 0 64k' 0 1
+
+    ### Zero and unmap half of cluster #0 (this won't unmap it)
+    _test_write 'write -q -z -u 0 32k' 0 1
+
+    ### Zero and unmap cluster #0
+    _test_write 'write -q -z -u 0 64k' 0 1
+
+    ### Write subcluster #2 (middle of subcluster)
+    _test_write 'write -q -P 10 3k 512' 0 1
+
+    ### Fill cluster #0 with data
+    _test_write 'write -q -P 11 0 64k' 0 1
+
+    ### Discard cluster #0
+    _test_write 'discard -q 0 64k' 0 1
+
+    ### Write compressed data to cluster #0
+    _test_write 'write -q -c -P 12 0 64k' 0 1
+
+    ### Write subcluster #2 (middle of subcluster)
+    _test_write 'write -q -P 13 3k 512' 0 1
+done
+
+# Test that corrupted L2 entries are detected in both read and write
+# operations
+for corruption_test_cmd in read write; do
+    echo
+    echo "### Corrupted L2 entries - $corruption_test_cmd test (allocated) ###"
+    echo
+    echo "# 'cluster is zero' bit set on the standard cluster descriptor"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q 0 2k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+7)) "\x01"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# Both 'subcluster is zero' and 'subcluster is allocated' bits set"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q 0 2k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x80"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "### Corrupted L2 entries - $corruption_test_cmd test (unallocated) ###"
+    echo
+    echo "# 'cluster is zero' bit set on the standard cluster descriptor"
+    echo
+    _make_test_img 1M
+    # Write to cluster #4 in order to initialize the L2 table
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+7)) "\x01"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# 'subcluster is allocated' bit set"
+    echo
+    _make_test_img 1M
+    # Write to cluster #4 in order to initialize the L2 table
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+12)) "\x80"
+    _read_l2_entry 0 1
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# Both 'subcluster is zero' and 'subcluster is allocated' bits set"
+    echo
+    _make_test_img 1M
+    # Write to cluster #4 in order to initialize the L2 table
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x80\x00\x00\x00\x80"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "### Compressed cluster with subcluster bitmap != 0 - $corruption_test_cmd test ###"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q -c 0 64k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x01"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+done
+
+echo
+echo "### Image creation options ###"
+echo
+echo "# cluster_size < 16k"
+IMGOPTS="extended_l2=on,cluster_size=8k" _make_test_img 1M
+
+echo "# backing file and preallocation=metadata"
+IMGOPTS="extended_l2=on,preallocation=metadata" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo "# backing file and preallocation=falloc"
+IMGOPTS="extended_l2=on,preallocation=falloc" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo "# backing file and preallocation=full"
+IMGOPTS="extended_l2=on,preallocation=full" _make_test_img -b "$TEST_IMG.backing" 1M
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
+
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
new file mode 100644
index 0000000000..fa4da573e0
--- /dev/null
+++ b/tests/qemu-iotests/271.out
@@ -0,0 +1,183 @@
+QA output created by 271
+
+### Standard write tests (backing file: yes) ###
+
+Formatting 'TEST_DIR/t.IMGFMT.raw', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT.backing', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing
+write -q -P 1 0 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 80 00 00 00  ................
+write -q -P 2 3k 512
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -P 3 5k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 e0 00 00 00  ................
+write -q -P 4 6k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 f0 00 00 00  ................
+write -q -P 5 8k 6k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 fe 00 00 00  ................
+write -q -P 6 15k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 00 00  ................
+write -q -P 7 32k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 00  ................
+00040010:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 8 63k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+00040010:  80 00 00 00 00 06 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -z 2k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+write -q -z 0 64k
+00040000:  80 00 00 00 00 05 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 9 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 32k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 10 3k 512
+00040000:  80 00 00 00 00 05 00 00 bf ff ff ff 40 00 00 00  ................
+write -q -P 11 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+discard -q 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -c -P 12 0 64k
+00040000:  40 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 13 3k 512
+00040000:  80 00 00 00 00 07 00 00 00 00 00 00 ff ff ff ff  ................
+
+### Standard write tests (backing file: no) ###
+
+Formatting 'TEST_DIR/t.IMGFMT.raw', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+write -q -P 1 0 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 80 00 00 00  ................
+write -q -P 2 3k 512
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -P 3 5k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 e0 00 00 00  ................
+write -q -P 4 6k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 f0 00 00 00  ................
+write -q -P 5 8k 6k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 fe 00 00 00  ................
+write -q -P 6 15k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 00 00  ................
+write -q -P 7 32k 1k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 00  ................
+00040010:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 8 63k 4k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+00040010:  80 00 00 00 00 06 00 00 00 00 00 00 c0 00 00 00  ................
+write -q -z 2k 2k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff c0 80 01  ................
+write -q -z 0 64k
+00040000:  80 00 00 00 00 05 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 9 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 32k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+write -q -z -u 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -P 10 3k 512
+00040000:  80 00 00 00 00 05 00 00 bf ff ff ff 40 00 00 00  ................
+write -q -P 11 0 64k
+00040000:  80 00 00 00 00 05 00 00 00 00 00 00 ff ff ff ff  ................
+discard -q 0 64k
+00040000:  00 00 00 00 00 00 00 00 ff ff ff ff 00 00 00 00  ................
+write -q -c -P 12 0 64k
+00040000:  40 00 00 00 00 05 00 00 00 00 00 00 00 00 00 00  ................
+write -q -P 13 3k 512
+00040000:  80 00 00 00 00 07 00 00 00 00 00 00 ff ff ff ff  ................
+
+### Corrupted L2 entries - read test (allocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Corrupted L2 entries - read test (unallocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# 'subcluster is allocated' bit set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+00040000:  00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00  ................
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Compressed cluster with subcluster bitmap != 0 - read test ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Corrupted L2 entries - write test (allocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Corrupted L2 entries - write test (unallocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# 'subcluster is allocated' bit set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+00040000:  00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00  ................
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Compressed cluster with subcluster bitmap != 0 - write test ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Image creation options ###
+
+# cluster_size < 16k
+qemu-img: TEST_DIR/t.IMGFMT: Extended L2 entries are only supported with cluster sizes of at least 16384 bytes
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+# backing file and preallocation=metadata
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=metadata
+# backing file and preallocation=falloc
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=falloc
+# backing file and preallocation=full
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=full
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index af322af756..945e6d68cc 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -282,3 +282,4 @@
 267 rw auto quick snapshot
 268 rw auto quick
 270 rw backing quick
+271 rw auto
-- 
2.20.1



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()
  2019-10-26 21:25 ` [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta() Alberto Garcia
@ 2019-10-28 12:50   ` Vladimir Sementsov-Ogievskiy
  2019-10-30 15:56     ` Alberto Garcia
  2019-10-30 12:04   ` Max Reitz
  1 sibling, 1 reply; 65+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-28 12:50 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Denis Lunev, qemu-block, Max Reitz

27.10.2019 0:25, Alberto Garcia wrote:
> handle_alloc() creates a QCowL2Meta structure in order to update the
> image metadata and perform the necessary copy-on-write operations.
> 
> This patch moves that code to a separate function so it can be used
> from other places.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cluster.c | 76 +++++++++++++++++++++++++++++--------------
>   1 file changed, 52 insertions(+), 24 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 8982b7b762..6c1dcdc781 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1019,6 +1019,55 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>                           QCOW2_DISCARD_NEVER);
>   }
>   
> +/*
> + * For a given write request, create a new QCowL2Meta structure and
> + * add it to @m.
> + *
> + * @host_offset points to the beginning of the first cluster.
> + *
> + * @guest_offset and @bytes indicate the offset and length of the
> + * request.
> + *
> + * If @keep_old is true it means that the clusters were already
> + * allocated and will be overwritten. If false then the clusters are
> + * new and we have to decrease the reference count of the old ones.
> + */
> +static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
> +                              uint64_t guest_offset, uint64_t bytes,
> +                              QCowL2Meta **m, bool keep_old)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    unsigned cow_start_from = 0;
> +    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
> +    unsigned cow_end_from = cow_start_to + bytes;
> +    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);

New code is easier to read!

Interesting, seems QEMU_ALIGN_UP is more popular.. But comment above ROUND_UP
makes sense in using it here..

> +    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
> +    QCowL2Meta *old_m = *m;
> +
> +    *m = g_malloc0(sizeof(**m));
> +    **m = (QCowL2Meta) {
> +        .next           = old_m,
> +
> +        .alloc_offset   = host_offset,
> +        .offset         = start_of_cluster(s, guest_offset),
> +        .nb_clusters    = nb_clusters,
> +
> +        .keep_old_clusters = keep_old,
> +
> +        .cow_start = {
> +            .offset     = cow_start_from,
> +            .nb_bytes   = cow_start_to - cow_start_from,
> +        },
> +        .cow_end = {
> +            .offset     = cow_end_from,
> +            .nb_bytes   = cow_end_to - cow_end_from,
> +        },
> +    };
> +
> +    qemu_co_queue_init(&(*m)->dependent_requests);
> +    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
> +}
> +
>   /*
>    * Returns the number of contiguous clusters that can be used for an allocating
>    * write, but require COW to be performed (this includes yet unallocated space,
> @@ -1417,35 +1466,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>       uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
>       int avail_bytes = nb_clusters << s->cluster_bits;
>       int nb_bytes = MIN(requested_bytes, avail_bytes);
> -    QCowL2Meta *old_m = *m;
> -
> -    *m = g_malloc0(sizeof(**m));
> -
> -    **m = (QCowL2Meta) {
> -        .next           = old_m,
> -
> -        .alloc_offset   = alloc_cluster_offset,
> -        .offset         = start_of_cluster(s, guest_offset),
> -        .nb_clusters    = nb_clusters,
> -
> -        .keep_old_clusters  = keep_old_clusters,
> -
> -        .cow_start = {
> -            .offset     = 0,
> -            .nb_bytes   = offset_into_cluster(s, guest_offset),
> -        },
> -        .cow_end = {
> -            .offset     = nb_bytes,

hmm this logic is changed.. after the patch, it would be not nb_bytes, but

offset_into_cluster(s, guest_offset) + MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset))

> -            .nb_bytes   = avail_bytes - nb_bytes,
> -        },
> -    };
> -    qemu_co_queue_init(&(*m)->dependent_requests);
> -    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>   
>       *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
>       *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>       assert(*bytes != 0);
>   
> +    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
> +                      m, keep_old_clusters);
> +
>       return 1;
>   
>   fail:
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 02/26] qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  2019-10-26 21:25 ` [RFC PATCH v2 02/26] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
@ 2019-10-28 13:55   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 65+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-28 13:55 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Denis Lunev, qemu-block, Max Reitz

27.10.2019 0:25, Alberto Garcia wrote:
> We are going to need it in other places.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---
>   block/qcow2-cluster.c | 34 +++++++++++++++++++---------------
>   1 file changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 6c1dcdc781..aa1010a515 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1068,6 +1068,24 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>       QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>   }
>   
> +/* Returns true if writing to a cluster requires COW */
> +static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
> +{
> +    switch (qcow2_get_cluster_type(bs, l2_entry)) {
> +    case QCOW2_CLUSTER_NORMAL:
> +        if (l2_entry & QCOW_OFLAG_COPIED) {
> +            return false;
> +        }
> +    case QCOW2_CLUSTER_UNALLOCATED:
> +    case QCOW2_CLUSTER_COMPRESSED:
> +    case QCOW2_CLUSTER_ZERO_PLAIN:
> +    case QCOW2_CLUSTER_ZERO_ALLOC:
> +        return true;
> +    default:
> +        abort();
> +    }
> +}

Not sure how much is it better than just
return !(qcow2_get_cluster_type(bs, l2_entry) == QCOW2_CLUSTER_NORMAL && (l2_entry & QCOW_OFLAG_COPIED));

> +
>   /*
>    * Returns the number of contiguous clusters that can be used for an allocating
>    * write, but require COW to be performed (this includes yet unallocated space,
> @@ -1080,25 +1098,11 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
>   
>       for (i = 0; i < nb_clusters; i++) {
>           uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
> -        QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
> -
> -        switch(cluster_type) {
> -        case QCOW2_CLUSTER_NORMAL:
> -            if (l2_entry & QCOW_OFLAG_COPIED) {
> -                goto out;
> -            }
> +        if (!cluster_needs_cow(bs, l2_entry)) {
>               break;
> -        case QCOW2_CLUSTER_UNALLOCATED:
> -        case QCOW2_CLUSTER_COMPRESSED:
> -        case QCOW2_CLUSTER_ZERO_PLAIN:
> -        case QCOW2_CLUSTER_ZERO_ALLOC:
> -            break;
> -        default:
> -            abort();
>           }
>       }
>   
> -out:
>       assert(i <= nb_clusters);
>       return i;
>   }
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()
  2019-10-26 21:25 ` [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta() Alberto Garcia
  2019-10-28 12:50   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-30 12:04   ` Max Reitz
  2019-10-30 16:02     ` Alberto Garcia
  1 sibling, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-10-30 12:04 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 4926 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> handle_alloc() creates a QCowL2Meta structure in order to update the
> image metadata and perform the necessary copy-on-write operations.
> 
> This patch moves that code to a separate function so it can be used
> from other places.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 76 +++++++++++++++++++++++++++++--------------
>  1 file changed, 52 insertions(+), 24 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 8982b7b762..6c1dcdc781 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1019,6 +1019,55 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>                          QCOW2_DISCARD_NEVER);
>  }
>  
> +/*
> + * For a given write request, create a new QCowL2Meta structure and
> + * add it to @m.
> + *
> + * @host_offset points to the beginning of the first cluster.

(I intended not to comment on such things on an RFC, but here I am...)

I’d call it host_cluster_offset to make clear that it points to a
cluster and isn’t the host offset for @guest_offset.

And now that I’ve gone this far already, I might as well say that I’d
like if it the comment noted that this function not only creates the
L2Meta structure but also adds it to the cluster_allocs list.

> + * @guest_offset and @bytes indicate the offset and length of the
> + * request.
> + *
> + * If @keep_old is true it means that the clusters were already
> + * allocated and will be overwritten. If false then the clusters are
> + * new and we have to decrease the reference count of the old ones.
> + */
> +static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
> +                              uint64_t guest_offset, uint64_t bytes,

And now I’m so deep into non-RFC-level review territory, I might as well
say I’d prefer @bytes to be an unsigned (or maybe even a plain int),
because anything more wouldn’t work.  (Not least because cow_end_to is
an unsigned).

Sorry...

Max

> +                              QCowL2Meta **m, bool keep_old)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    unsigned cow_start_from = 0;
> +    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
> +    unsigned cow_end_from = cow_start_to + bytes;
> +    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
> +    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
> +    QCowL2Meta *old_m = *m;
> +
> +    *m = g_malloc0(sizeof(**m));
> +    **m = (QCowL2Meta) {
> +        .next           = old_m,
> +
> +        .alloc_offset   = host_offset,
> +        .offset         = start_of_cluster(s, guest_offset),
> +        .nb_clusters    = nb_clusters,
> +
> +        .keep_old_clusters = keep_old,
> +
> +        .cow_start = {
> +            .offset     = cow_start_from,
> +            .nb_bytes   = cow_start_to - cow_start_from,
> +        },
> +        .cow_end = {
> +            .offset     = cow_end_from,
> +            .nb_bytes   = cow_end_to - cow_end_from,
> +        },
> +    };
> +
> +    qemu_co_queue_init(&(*m)->dependent_requests);
> +    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
> +}
> +
>  /*
>   * Returns the number of contiguous clusters that can be used for an allocating
>   * write, but require COW to be performed (this includes yet unallocated space,
> @@ -1417,35 +1466,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>      uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
>      int avail_bytes = nb_clusters << s->cluster_bits;
>      int nb_bytes = MIN(requested_bytes, avail_bytes);
> -    QCowL2Meta *old_m = *m;
> -
> -    *m = g_malloc0(sizeof(**m));
> -
> -    **m = (QCowL2Meta) {
> -        .next           = old_m,
> -
> -        .alloc_offset   = alloc_cluster_offset,
> -        .offset         = start_of_cluster(s, guest_offset),
> -        .nb_clusters    = nb_clusters,
> -
> -        .keep_old_clusters  = keep_old_clusters,
> -
> -        .cow_start = {
> -            .offset     = 0,
> -            .nb_bytes   = offset_into_cluster(s, guest_offset),
> -        },
> -        .cow_end = {
> -            .offset     = nb_bytes,
> -            .nb_bytes   = avail_bytes - nb_bytes,
> -        },
> -    };
> -    qemu_co_queue_init(&(*m)->dependent_requests);
> -    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>  
>      *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
>      *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>      assert(*bytes != 0);
>  
> +    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
> +                      m, keep_old_clusters);
> +
>      return 1;
>  
>  fail:
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-10-26 21:25 ` [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
@ 2019-10-30 14:24   ` Max Reitz
  2019-11-13 14:25     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-10-30 14:24 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 15401 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> When writing to a qcow2 file there are two functions that take a
> virtual offset and return a host offset, possibly allocating new
> clusters if necessary:
> 
>    - handle_copied() looks for normal data clusters that are already
>      allocated and have a reference count of 1. In those clusters we
>      can simply write the data and there is no need to perform any
>      copy-on-write.
> 
>    - handle_alloc() looks for clusters that do need copy-on-write,
>      either because they haven't been allocated yet, because their
>      reference count is != 1 or because they are ZERO_ALLOC clusters.
> 
> The ZERO_ALLOC case is a bit special because those are clusters that
> are already allocated and they could perfectly be dealt with in
> handle_copied() (as long as copy-on-write is performed when required).
> 
> In fact, there is extra code specifically for them in handle_alloc()
> that tries to reuse the existing allocation if possible and frees them
> otherwise.
> 
> This patch changes the handling of ZERO_ALLOC clusters so the
> semantics of these two functions are now like this:
> 
>    - handle_copied() looks for clusters that are already allocated and
>      which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
>      reference count of 1).
> 
>    - handle_alloc() looks for clusters for which we need a new
>      allocation (all other cases).
> 
> One importante difference after this change is that clusters found in
> handle_copied() may now require copy-on-write, but this will be anyway
> necessary once we add support for subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 177 +++++++++++++++++++++++-------------------
>  1 file changed, 96 insertions(+), 81 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index aa1010a515..ee6b46f917 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1021,7 +1021,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>  
>  /*
>   * For a given write request, create a new QCowL2Meta structure and
> - * add it to @m.
> + * add it to @m. If the write request does not need copy-on-write or
> + * changes to the L2 metadata then this function does nothing.
>   *
>   * @host_offset points to the beginning of the first cluster.
>   *
> @@ -1034,15 +1035,51 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>   */
>  static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>                                uint64_t guest_offset, uint64_t bytes,
> -                              QCowL2Meta **m, bool keep_old)
> +                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
>  {
>      BDRVQcow2State *s = bs->opaque;
> -    unsigned cow_start_from = 0;
> +    int l2_index = offset_to_l2_slice_index(s, guest_offset);
> +    uint64_t l2_entry;
> +    unsigned cow_start_from, cow_end_to;
>      unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>      unsigned cow_end_from = cow_start_to + bytes;
> -    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>      unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>      QCowL2Meta *old_m = *m;
> +    QCow2ClusterType type;
> +
> +    /* Return if there's no COW (all clusters are normal and we keep them) */
> +    if (keep_old) {
> +        int i;
> +        for (i = 0; i < nb_clusters; i++) {
> +            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);

I’d assert somewhere that l2_index + nb_clusters - 1 won’t overflow.

> +            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {

Wouldn’t cluster_needs_cow() be better?

> +                break;
> +            }
> +        }
> +        if (i == nb_clusters) {
> +            return;
> +        }
> +    }

So I understand we always need to create an L2Meta structure in all
other cases because we at least need to turn those clusters into normal
clusters?  (Even if they’re already allocated, as in the case of
allocated zero clusters.)

> +
> +    /* Get the L2 entry from the first cluster */
> +    l2_entry = be64_to_cpu(l2_slice[l2_index]);
> +    type = qcow2_get_cluster_type(bs, l2_entry);
> +
> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> +        cow_start_from = cow_start_to;
> +    } else {
> +        cow_start_from = 0;
> +    }
> +
> +    /* Get the L2 entry from the last cluster */
> +    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
> +    type = qcow2_get_cluster_type(bs, l2_entry);
> +
> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> +        cow_end_to = cow_end_from;
> +    } else {
> +        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
> +    }
>  
>      *m = g_malloc0(sizeof(**m));
>      **m = (QCowL2Meta) {
> @@ -1068,18 +1105,18 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>      QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>  }
>  
> -/* Returns true if writing to a cluster requires COW */
> +/* Returns true if the cluster is unallocated or has refcount > 1 */
>  static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
>  {
>      switch (qcow2_get_cluster_type(bs, l2_entry)) {
>      case QCOW2_CLUSTER_NORMAL:
> +    case QCOW2_CLUSTER_ZERO_ALLOC:
>          if (l2_entry & QCOW_OFLAG_COPIED) {
>              return false;

Don’t zero-allocated clusters need COW always?  (Because the at the
given host offset isn’t guaranteed to be zero.)

>          }
>      case QCOW2_CLUSTER_UNALLOCATED:
>      case QCOW2_CLUSTER_COMPRESSED:
>      case QCOW2_CLUSTER_ZERO_PLAIN:
> -    case QCOW2_CLUSTER_ZERO_ALLOC:
>          return true;
>      default:
>          abort();
> @@ -1087,20 +1124,34 @@ static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
>  }
>  
>  /*
> - * Returns the number of contiguous clusters that can be used for an allocating
> - * write, but require COW to be performed (this includes yet unallocated space,
> - * which must copy from the backing file)
> + * Returns the number of contiguous clusters that can be written to
> + * using one single write request, starting from @l2_index.
> + * At most @nb_clusters are checked.
> + *
> + * If @want_cow is true this counts clusters that are either
> + * unallocated, or allocated but with refcount > 1.

+(So they need to be newly allocated and COWed)?

(Or is the past participle of COW COWn?  Or maybe CedOW?)

> + *
> + * If @want_cow is false this counts clusters that are already
> + * allocated and can be written to using their current locations

s/using their current locations/in-place/?

> + * (including QCOW2_CLUSTER_ZERO_ALLOC).
>   */
>  static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,

Hm, well, it’s not really cow anymore.  Would
count_single_write_clusters() work?

> -    uint64_t *l2_slice, int l2_index)
> +                              uint64_t *l2_slice, int l2_index, bool want_cow)
>  {
> +    BDRVQcow2State *s = bs->opaque;
> +    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
> +    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
>      int i;
>  
>      for (i = 0; i < nb_clusters; i++) {
> -        uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
> -        if (!cluster_needs_cow(bs, l2_entry)) {
> +        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
> +        if (cluster_needs_cow(bs, l2_entry) != want_cow) {
>              break;
>          }
> +        if (!want_cow && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
> +            break;
> +        }
> +        expected_offset += s->cluster_size;
>      }
>  
>      assert(i <= nb_clusters);
> @@ -1228,18 +1279,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>  
>      cluster_offset = be64_to_cpu(l2_slice[l2_index]);
>  
> -    /* Check how many clusters are already allocated and don't need COW */
> -    if (qcow2_get_cluster_type(bs, cluster_offset) == QCOW2_CLUSTER_NORMAL
> -        && (cluster_offset & QCOW_OFLAG_COPIED))
> -    {
> +    if (!cluster_needs_cow(bs, cluster_offset)) {
>          /* If a specific host_offset is required, check it */
>          bool offset_matches =
>              (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
>  
>          if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {
> -            qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
> +            qcow2_signal_corruption(bs, true, -1, -1, "%s cluster offset "
>                                      "%#llx unaligned (guest offset: %#" PRIx64
> -                                    ")", cluster_offset & L2E_OFFSET_MASK,
> +                                    ")", cluster_offset & QCOW_OFLAG_ZERO ?
> +                                    "Preallocated zero" : "Data",
> +                                    cluster_offset & L2E_OFFSET_MASK,
>                                      guest_offset);
>              ret = -EIO;
>              goto out;
> @@ -1252,15 +1302,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>          }
>  
>          /* We keep all QCOW_OFLAG_COPIED clusters */
> -        keep_clusters =
> -            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
> -                                      &l2_slice[l2_index],
> -                                      QCOW_OFLAG_COPIED | QCOW_OFLAG_ZERO);
> +        keep_clusters = count_cow_clusters(bs, nb_clusters, l2_slice,
> +                                           l2_index, false);
>          assert(keep_clusters <= nb_clusters);
>  
>          *bytes = MIN(*bytes,
>                   keep_clusters * s->cluster_size
>                   - offset_into_cluster(s, guest_offset));
> +        assert(*bytes != 0);
> +
> +        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
> +                          *bytes, l2_slice, m, true);

We wouldn’t need calculate_l2_meta() here if the clusters really all
didn’t need COW.

We do need it because cluster_needs_cow() returns false for zero
clusters, which isn’t what the function name says.

>  
>          ret = 1;
>      } else {
> @@ -1361,12 +1413,10 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>      BDRVQcow2State *s = bs->opaque;
>      int l2_index;
>      uint64_t *l2_slice;
> -    uint64_t entry;
>      uint64_t nb_clusters;
>      int ret;
> -    bool keep_old_clusters = false;
>  
> -    uint64_t alloc_cluster_offset = INV_OFFSET;
> +    uint64_t alloc_cluster_offset;
>  
>      trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
>                               *bytes);
> @@ -1392,67 +1442,31 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>          return ret;
>      }
>  
> -    entry = be64_to_cpu(l2_slice[l2_index]);
> -    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index);
> +    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index, true);
>  
>      /* This function is only called when there were no non-COW clusters, so if
>       * we can't find any unallocated or COW clusters either, something is
>       * wrong with our code. */
>      assert(nb_clusters > 0);
>  
> -    if (qcow2_get_cluster_type(bs, entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
> -        (entry & QCOW_OFLAG_COPIED) &&
> -        (*host_offset == INV_OFFSET ||
> -         start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
> -    {
> -        int preallocated_nb_clusters;
> -
> -        if (offset_into_cluster(s, entry & L2E_OFFSET_MASK)) {
> -            qcow2_signal_corruption(bs, true, -1, -1, "Preallocated zero "
> -                                    "cluster offset %#llx unaligned (guest "
> -                                    "offset: %#" PRIx64 ")",
> -                                    entry & L2E_OFFSET_MASK, guest_offset);
> -            ret = -EIO;
> -            goto fail;
> -        }
> -
> -        /* Try to reuse preallocated zero clusters; contiguous normal clusters
> -         * would be fine, too, but count_cow_clusters() above has limited
> -         * nb_clusters already to a range of COW clusters */
> -        preallocated_nb_clusters =
> -            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
> -                                      &l2_slice[l2_index], QCOW_OFLAG_COPIED);
> -        assert(preallocated_nb_clusters > 0);
> -
> -        nb_clusters = preallocated_nb_clusters;
> -        alloc_cluster_offset = entry & L2E_OFFSET_MASK;
> -
> -        /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
> -         * should not free them. */
> -        keep_old_clusters = true;
> +    /* Allocate, if necessary at a given offset in the image file */

Well, it’s always necessary now.

> +    alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
> +        start_of_cluster(s, *host_offset);
> +    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
> +                                  &nb_clusters);
> +    if (ret < 0) {
> +        goto out;
>      }
>  
> -    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
> -
> -    if (alloc_cluster_offset == INV_OFFSET) {
> -        /* Allocate, if necessary at a given offset in the image file */
> -        alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
> -                               start_of_cluster(s, *host_offset);
> -        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
> -                                      &nb_clusters);
> -        if (ret < 0) {
> -            goto fail;
> -        }
> -
> -        /* Can't extend contiguous allocation */
> -        if (nb_clusters == 0) {
> -            *bytes = 0;
> -            return 0;
> -        }
> -
> -        assert(alloc_cluster_offset != INV_OFFSET);
> +    /* Can't extend contiguous allocation */
> +    if (nb_clusters == 0) {
> +        *bytes = 0;
> +        ret = 0;
> +        goto out;
>      }
>  
> +    assert(alloc_cluster_offset != INV_OFFSET);
> +
>      /*
>       * Save info needed for meta data update.
>       *
> @@ -1475,13 +1489,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>      *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>      assert(*bytes != 0);
>  
> -    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
> -                      m, keep_old_clusters);
> +    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
> +                      m, false);
>  
> -    return 1;
> +    ret = 1;
>  
> -fail:
> -    if (*m && (*m)->nb_clusters > 0) {
> +out:
> +    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);

Is this a bug fix?

Max

> +    if (ret < 0 && *m && (*m)->nb_clusters > 0) {
>          QLIST_REMOVE(*m, next_in_flight);
>      }
>      return ret;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()
  2019-10-28 12:50   ` Vladimir Sementsov-Ogievskiy
@ 2019-10-30 15:56     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-30 15:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel\
  Cc: Kevin Wolf, Anton Nefedov, Denis Lunev, qemu-block, Max Reitz

On Mon 28 Oct 2019 01:50:54 PM CET, Vladimir Sementsov-Ogievskiy wrote:
>> -        .cow_end = {
>> -            .offset     = nb_bytes,
>
> hmm this logic is changed.. after the patch, it would be not nb_bytes, but
>
> offset_into_cluster(s, guest_offset) + MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset))

It hasn't changed. The value of .cow_end.offset before the patch is
nb_bytes (these two lines are equivalent):

  nb_bytes = MIN(requested_bytes, avail_bytes);
  nb_bytes = MIN(*bytes + offset_into_cluster, avail_bytes);

After the patch we update *bytes to pass it to calculate_l2_meta(). Here
is the value (again, these are all equivalent):

  *bytes = MIN(*bytes, nb_bytes - offset_into_cluster)
  *bytes = MIN(*bytes, MIN(*bytes + offset_into_cluster, avail_bytes) -
           offset_into_cluster)
  *bytes = MIN(*bytes, MIN(*bytes, avail_bytes - offset_into_cluster))
  *bytes = MIN(*bytes, avail_bytes - offset_into_cluster)

And here's the value of .cow_end.offset set in calculate_l2_meta():

  cow_end_from = *bytes + offset_into_cluster
  cow_end_from = MIN(*bytes, avail_bytes - offset_into_cluster) + 
                 offset_into_cluster
  cow_end_from = MIN(*bytes + offset_into_cluster, avail_bytes)

This last definition of cow_end_from is the same as nb_bytes as used
before the patch.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta()
  2019-10-30 12:04   ` Max Reitz
@ 2019-10-30 16:02     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-30 16:02 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Wed 30 Oct 2019 01:04:02 PM CET, Max Reitz wrote:
> (I intended not to comment on such things on an RFC, but here I am...)

No problem with that :-)

> I’d call it host_cluster_offset to make clear that it points to a
> cluster and isn’t the host offset for @guest_offset.

Sure, why not. We can also accept the exact offset within the cluster
and then use start_of_cluster(), but I prefer this one.

> And now that I’ve gone this far already, I might as well say that I’d
> like if it the comment noted that this function not only creates the
> L2Meta structure but also adds it to the cluster_allocs list.

I'll update the comment.

>> + * @guest_offset and @bytes indicate the offset and length of the
>> + * request.
>> + *
>> + * If @keep_old is true it means that the clusters were already
>> + * allocated and will be overwritten. If false then the clusters are
>> + * new and we have to decrease the reference count of the old ones.
>> + */
>> +static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>> +                              uint64_t guest_offset, uint64_t bytes,
>
> And now I’m so deep into non-RFC-level review territory, I might as
> well say I’d prefer @bytes to be an unsigned (or maybe even a plain
> int), because anything more wouldn’t work.  (Not least because
> cow_end_to is an unsigned).

True, I'll correct that too.

Thanks!

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature
  2019-10-26 21:25 ` [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature Alberto Garcia
@ 2019-10-30 16:23   ` Max Reitz
  2019-10-30 22:38     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-10-30 16:23 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 3909 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>  docs/qcow2-cache.txt   | 19 +++++++++++-
>  2 files changed, 83 insertions(+), 4 deletions(-)

Sounds good, just a bit of bit nit-picking from my side.

> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index af5711e533..d34261f955 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt

[...]

> @@ -524,6 +535,57 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
>  no backing file or the backing file is smaller than the image, they shall read
>  zeros for all parts that are not covered by the backing file.
>  
> +== Extended L2 Entries ==
> +
> +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
> +field of the header.
> +
> +In these images standard data clusters are divided into 32 subclusters of the
> +same size. They are contiguous and start from the beginning of the cluster.
> +Subclusters can be allocated independently and the L2 entry contains information
> +indicating the status of each one of them. Compressed data clusters don't have
> +subclusters so they are treated like in images without this feature.
> +
> +The size of an extended L2 entry is 128 bits so the number of entries per table
> +is calculated using this formula:
> +
> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
> +
> +The first 64 bits have the same format as the standard L2 table entry described
> +in the previous section, with the exception of bit 0 of the standard cluster
> +descriptor.
> +
> +The last 64 bits contain a subcluster allocation bitmap with this format:
> +
> +Subcluster Allocation Bitmap (for standard clusters):
> +
> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
> +
> +                    1: the subcluster is allocated. In this case the
> +                       host cluster offset field must contain a valid
> +                       offset.
> +                    0: the subcluster is not allocated. In this case
> +                       read requests shall go to the backing file or
> +                       return zeros if there is no backing file data.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 31 - x)

I seem to remember that someone proposed this bit ordering to you, but I
wonder why.  So far everything in qcow2 starts from the least
significant bit, for example refcounts (“If refcount_bits implies a
sub-byte width, note that bit 0 means the least significant bit in this
context”), feature bits, and sub-byte structure descriptions in general
(which you reference directly with “bit x”).

Soo...  What’s the reason for doing it the other way around here?

> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
> +
> +                    1: the subcluster reads as zeros. In this case the
> +                       allocation status bit must be unset. The host
> +                       cluster offset field may or may not be set.
> +                    0: no effect.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 63 - x)
> +
> +Subcluster Allocation Bitmap (for compressed clusters):

Going from nit-picking to bike shedding: I’d drop the parentheses.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 09/26] qcow2: Add l2_entry_size()
  2019-10-26 21:25 ` [RFC PATCH v2 09/26] qcow2: Add l2_entry_size() Alberto Garcia
@ 2019-10-30 16:47   ` Max Reitz
  0 siblings, 0 replies; 65+ messages in thread
From: Max Reitz @ 2019-10-30 16:47 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 838 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> qcow2 images with subclusters have 128-bit L2 entries. The first 64
> bits contain the same information as traditional images and the last
> 64 bits form a bitmap with the status of each individual subcluster.
> 
> Because of that we cannot assume that L2 entries are sizeof(uint64_t)
> anymore. This function returns the proper value for the image.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c  | 12 ++++++------
>  block/qcow2-refcount.c | 14 ++++++++------
>  block/qcow2.c          |  6 +++---
>  block/qcow2.h          |  5 +++++
>  4 files changed, 22 insertions(+), 15 deletions(-)

I suppose qcow2_calc_prealloc_size(), qcow2_co_truncate()
(nb_new_l2_tables), and qcow2_measure() (l2_tables) also need some
adjustment.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2019-10-26 21:25 ` [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
@ 2019-10-30 16:55   ` Max Reitz
  2019-11-14 13:57     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-10-30 16:55 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2491 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
> 64 bits for the subcluster allocation bitmap.
> 
> In order to support them correctly get/set_l2_entry() need to be
> updated so they take the entry width into account in order to
> calculate the correct offset.
> 
> This patch also adds the get/set_l2_bitmap() functions that are used
> to access the bitmaps. For convenience, these functions are no-ops
> when used in traditional qcow2 images.

Granted, I haven’t seen the following patches yet, but if these
functions are indeed called for images that don’t have subclusters,
shouldn’t they return 0x0*0f*f then? (i.e. everything allocated)

If they aren’t, they should probably just abort().  Well,
set_l2_bitmap() should probably always abort() if there aren’t any
subclusters.

Max

> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 29a253bfb9..741c41c80f 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -507,15 +507,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
>  static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
>                                      int idx)
>  {
> +    idx *= l2_entry_size(s) / sizeof(uint64_t);
>      return be64_to_cpu(l2_slice[idx]);
>  }
>  
> +static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
> +                                     int idx)
> +{
> +    if (has_subclusters(s)) {
> +        idx *= l2_entry_size(s) / sizeof(uint64_t);
> +        return be64_to_cpu(l2_slice[idx + 1]);
> +    } else {
> +        return 0;
> +    }
> +}
> +
>  static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
>                                  int idx, uint64_t entry)
>  {
> +    idx *= l2_entry_size(s) / sizeof(uint64_t);
>      l2_slice[idx] = cpu_to_be64(entry);
>  }
>  
> +static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
> +                                 int idx, uint64_t bitmap)
> +{
> +    if (has_subclusters(s)) {
> +        idx *= l2_entry_size(s) / sizeof(uint64_t);
> +        l2_slice[idx + 1] = cpu_to_be64(bitmap);
> +    }
> +}
> +
>  static inline bool has_data_file(BlockDriverState *bs)
>  {
>      BDRVQcow2State *s = bs->opaque;
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature
  2019-10-30 16:23   ` Max Reitz
@ 2019-10-30 22:38     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-10-30 22:38 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Wed 30 Oct 2019 05:23:30 PM CET, Max Reitz wrote:
>> +Subcluster Allocation Bitmap (for standard clusters):
>> +
>> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
>> +
>> +                    1: the subcluster is allocated. In this case the
>> +                       host cluster offset field must contain a valid
>> +                       offset.
>> +                    0: the subcluster is not allocated. In this case
>> +                       read requests shall go to the backing file or
>> +                       return zeros if there is no backing file data.
>> +
>> +                    Bits are assigned starting from the most significant one.
>> +                    (i.e. bit x is used for subcluster 31 - x)
>
> I seem to remember that someone proposed this bit ordering to you, but
> I wonder why.  So far everything in qcow2 starts from the least
> significant bit, for example refcounts (“If refcount_bits implies a
> sub-byte width, note that bit 0 means the least significant bit in
> this context”), feature bits, and sub-byte structure descriptions in
> general (which you reference directly with “bit x”).
>
> Soo...  What’s the reason for doing it the other way around here?

The reason is that I thought that it would be better for debugging
purposes. If I do an hexdump of the L2 table to see what's going on then
starting from the most significant bit gives me a better visual image of
what subclusters are allocated.

In other words, if the first two subclusters are allocated I think this
representation

               11000000 00000000 00000000 00000000  (c0 00 00 00)

is more natural than this one

               00000000 00000000 00000000 00000011  (00 00 00 03)

But I don't have a very strong opinion so I'm open to changing it.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type()
  2019-10-26 21:25 ` [RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type() Alberto Garcia
@ 2019-11-04 12:35   ` Max Reitz
  2019-11-04 15:01     ` Max Reitz
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 12:35 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2235 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> This function returns the type of an individual subcluster. If an
> image does not have subclusters then this returns the exact same value
> as qcow2_get_cluster_type().
> 
> The information in standard and extended L2 entries is encoded in a
> slightly different way, but all existing QCow2ClusterType values are
> also valid for subclusters and have the same meanings (although they
> typically only apply to the requested subcluster).
> 
> There are two important exceptions to this:
> 
>   a) QCOW2_CLUSTER_COMPRESSED means that the whole cluster is
>      compressed. We do not support compression at the subcluster
>      level.
> 
>   b) QCOW2_CLUSTER_UNALLOCATED means that the cluster is unallocated,
>      that is, the offset field of the L2 entry does not point to a
>      host cluster. All subclusters are obviously unallocated too but
>      any of them could be of type QCOW2_CLUSTER_ZERO_PLAIN.
> 
> In addition to that, extended L2 entries allow one new scenario where
> the cluster is normally allocated but an individual subcluster is not.
> This is very different from (b) and because of that this patch adds a
> new value called QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER.
> 
> As a last thing, this patch adds QCOW2_CLUSTER_INVALID to detect the
> cases where an L2 entry has a value that violates the spec. The caller
> is responsible for handling these situations.
> 
> To prevent compatibility problems with images that have invalid values
> but are currently being read by QEMU without causing side effects,
> QCOW2_CLUSTER_INVALID is only returned for images with extended L2
> entries.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)

[...]

> +        case QCOW2_CLUSTER_ZERO_PLAIN:
> +        case QCOW2_CLUSTER_ZERO_ALLOC:
> +            return QCOW2_CLUSTER_INVALID;

The spec doesn’t say anything about this, though.

(I would have assumed it’s valid and the subcluster zero status is
simply ignored.  It makes sense to forbit this case, but the spec should
make a note of it.)

Nax


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  2019-10-26 21:25 ` [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER Alberto Garcia
@ 2019-11-04 12:57   ` Max Reitz
  2019-11-04 13:03     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 12:57 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 837 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> In the previous patch we added a new QCow2ClusterType named
> QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places
> where this new value needs to be handled, and that is what this patch
> does.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
This patch deals with everything in qcow2.c.  There are more places that
reference QCOW2_CLUSTER_* constants elsewhere, and I suppose most of
them are handled by the following patches.

But I wonder what the criterion is on where it needs to be handled and
where it’s OK not to.  Right now it looks to me like it’s a bit
arbitrary maybe?  But I suppose I’ll just have to wait until after the
next patches.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  2019-11-04 12:57   ` Max Reitz
@ 2019-11-04 13:03     ` Alberto Garcia
  2019-11-04 13:10       ` Max Reitz
  0 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-11-04 13:03 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Mon 04 Nov 2019 01:57:42 PM CET, Max Reitz wrote:
> On 26.10.19 23:25, Alberto Garcia wrote:
>> In the previous patch we added a new QCow2ClusterType named
>> QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places
>> where this new value needs to be handled, and that is what this patch
>> does.
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>  block/qcow2.c | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
> This patch deals with everything in qcow2.c.  There are more places that
> reference QCOW2_CLUSTER_* constants elsewhere, and I suppose most of
> them are handled by the following patches.
>
> But I wonder what the criterion is on where it needs to be handled and
> where it’s OK not to.  Right now it looks to me like it’s a bit
> arbitrary maybe?  But I suppose I’ll just have to wait until after the
> next patches.

This is the part of the series that I'm the least happy about, because
the existing qcow2_get_cluster_type() can never return this new value, I
only updated the cases where this can actually happen.

I'm still considering a different approach for this.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  2019-11-04 13:03     ` Alberto Garcia
@ 2019-11-04 13:10       ` Max Reitz
  2019-11-07 14:44         ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 13:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1813 bytes --]

On 04.11.19 14:03, Alberto Garcia wrote:
> On Mon 04 Nov 2019 01:57:42 PM CET, Max Reitz wrote:
>> On 26.10.19 23:25, Alberto Garcia wrote:
>>> In the previous patch we added a new QCow2ClusterType named
>>> QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places
>>> where this new value needs to be handled, and that is what this patch
>>> does.
>>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>  block/qcow2.c | 13 +++++++++----
>>>  1 file changed, 9 insertions(+), 4 deletions(-)
>> This patch deals with everything in qcow2.c.  There are more places that
>> reference QCOW2_CLUSTER_* constants elsewhere, and I suppose most of
>> them are handled by the following patches.
>>
>> But I wonder what the criterion is on where it needs to be handled and
>> where it’s OK not to.  Right now it looks to me like it’s a bit
>> arbitrary maybe?  But I suppose I’ll just have to wait until after the
>> next patches.
> 
> This is the part of the series that I'm the least happy about, because
> the existing qcow2_get_cluster_type() can never return this new value, I
> only updated the cases where this can actually happen.
> 
> I'm still considering a different approach for this.
I still don’t know what you’re doing in the later patches, but to me it
looks a bit like you don’t dare breaking up the existing structure that
just deals with clusters.

If that is so, I think it will help to make a clear cut between what
concerns subclusters and what concerns clusters at a whole.
QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER shouldn’t be in QCow2ClusterType;
there should be a separate QCow2SubclusterType.

OTOH, that would require more modifications, but (naïvely) I believe
that would make for the cleaner interface in the end.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta()
  2019-10-26 21:25 ` [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
@ 2019-11-04 14:21   ` Max Reitz
  2019-11-08 15:18     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 14:21 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 3195 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> If an image has subclusters then there are more copy-on-write
> scenarios that we need to consider. Let's say we have a write request
> from the middle of subcluster #3 until the end of the cluster:
> 
>    - If the cluster is new, then subclusters #0 to #3 from the old
>      cluster must be copied into the new one.

You mean for snapshots?

(That isn’t quite clear, and I only guess this based on the next bullet
point which differentiates based on “the old cluster was unallocated”.
That’s weird, too, because what does that mean, old cluster and new
cluster?  I suppose it’s abstract and it just means “There was no old
cluster and now we’ve allocated something”.  I can only understand the
concept of old and new clusters for COW inside of an image, i.e. for
snapshots and compressed clusters (theoretically).)

>    - If the cluster is new but the old cluster was unallocated, then
>      only subcluster #3 needs copy-on-write. #0 to #2 are marked as
>      unallocated in the bitmap of the new L2 entry.
> 
>    - If we are overwriting an old cluster and subcluster #3 is
>      unallocated or has the all-zeroes bit set then we need
>      copy-on-write on subcluster #3.
> 
>    - If we are overwriting an old cluster and subcluster #3 was
>      allocated then there is no need to copy-on-write.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 136 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 108 insertions(+), 28 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 1f509bda15..990bc070af 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1034,14 +1034,16 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>   * If @keep_old is true it means that the clusters were already
>   * allocated and will be overwritten. If false then the clusters are
>   * new and we have to decrease the reference count of the old ones.
> + *
> + * Returns 1 on success, -errno on failure.

I think there should be a note here on why this doesn’t follow the
general 0/-errno schema (i.e., “, because that is what callers generally
expect”).

>   */

[...]

> +    if (!keep_old) {
> +        switch (type) {
> +        case QCOW2_CLUSTER_NORMAL:
> +        case QCOW2_CLUSTER_COMPRESSED:
> +        case QCOW2_CLUSTER_ZERO_ALLOC:
> +        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
> +            cow_start_from = 0;

Somehow (I don’t know why) I find this a bit tough to understand.

Wouldn’t it work to let cow_start start from the first subcluster for
ZERO_ALLOC and UNALLOCATED_SUBCLUSTER?  We don’t need to COW those, it
should be sufficient to just make the subclusters before that zero or
unallocated, respectively.

(Same for cow_end)

Max

> +            break;
> +        case QCOW2_CLUSTER_ZERO_PLAIN:
> +        case QCOW2_CLUSTER_UNALLOCATED:
> +            cow_start_from = sc_index << s->subcluster_bits;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset()
  2019-10-26 21:25 ` [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
@ 2019-11-04 14:58   ` Max Reitz
  2019-11-08 15:42     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 14:58 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 5452 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> The logic of this function remains pretty much the same, except that
> it uses count_contiguous_subclusters(), which combines the logic of
> count_contiguous_clusters() / count_contiguous_clusters_unallocated()
> and checks individual subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 111 ++++++++++++++++++++----------------------
>  1 file changed, 52 insertions(+), 59 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 990bc070af..e67559152f 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -372,66 +372,51 @@ fail:
>  }
>  
>  /*
> - * Checks how many clusters in a given L2 slice are contiguous in the image
> - * file. As soon as one of the flags in the bitmask stop_flags changes compared
> - * to the first cluster, the search is stopped and the cluster is not counted
> - * as contiguous. (This allows it, for example, to stop at the first compressed
> - * cluster which may require a different handling)
> + * Return the number of contiguous subclusters of the exact same type
> + * in a given L2 slice, starting from cluster @l2_index, subcluster
> + * @sc_index. At most @nb_clusters are checked. Allocated clusters are

I’d add a note that reassures that @nb_clusters really is a number of
clusters, not subclusters.  (Because if the variable were named “x”
(i.e., it’d be “At most @x are checked”), you’d assume it refers to a
number of subclusters.)

OTOH, what I don’t like so far about this series is that the “cluster
logic” is still everywhere when I think it should just be about
subclusters now.  (Except in few places where it must be about clusters
as in something that can have a distinct host offset and/or has an own
L2 entry.)  So maybe the parameter should really be @nb_subclusters.

But I’m not sure.  For how this function is written right now, it makes
sense for it to be @nb_clusters, but I think it could be changed so it
would work with @nb_subclusters, too.  (Just a single loop over the
subclusters and then loading l2_entry+l2_bitmap and checking the offset
at every cluster boundary.)

> + * also required to be contiguous in the image file.
>   */
> -static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
> -        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
> +static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
> +                                        unsigned sc_index, uint64_t *l2_slice,
> +                                        int l2_index)
>  {
>      BDRVQcow2State *s = bs->opaque;
> -    int i;
> -    QCow2ClusterType first_cluster_type;
> -    uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
> -    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
> -    uint64_t offset = first_entry & mask;
> -
> -    first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
> -    if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
> -        return 0;
> +    int i, j, count = 0;
> +    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
> +    uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
> +    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
> +    bool check_offset = true;
> +    QCow2ClusterType type =
> +        qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
> +
> +    assert(type != QCOW2_CLUSTER_INVALID); /* The caller should check this */
> +
> +    if (type == QCOW2_CLUSTER_COMPRESSED) {
> +        return 1; /* Compressed clusters are always counted one by one */

Hm, yes, but cluster by cluster, not subcluster by subcluster, so this
should be s->subclusters_per_cluster, and perhaps sc_index should be
asserted to be 0.  (Or it should be s->subclusters_per_cluster - sc_index.)

[...]

> @@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,

I suppose this is get_subcluster_offset now.

[...]

> @@ -587,6 +574,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>          goto fail;
>      }
>      switch (type) {
> +    case QCOW2_CLUSTER_INVALID:
> +        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
> +                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
> +                                l2_offset, l2_index);
> +        ret = -EIO;
> +        goto fail;
> +        break;

No need to break here.

>      case QCOW2_CLUSTER_COMPRESSED:
>          if (has_data_file(bs)) {
>              qcow2_signal_corruption(bs, true, -1, -1, "Compressed cluster "
> @@ -602,16 +596,15 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>          break;

Here (QCOW2_CLUSTER_COMPRESSED), c is 1, even though it should count
subclusters.

Max

>      case QCOW2_CLUSTER_ZERO_PLAIN:
>      case QCOW2_CLUSTER_UNALLOCATED:
> -        /* how many empty clusters ? */
> -        c = count_contiguous_clusters_unallocated(bs, nb_clusters,
> -                                                  l2_slice, l2_index, type);
> +        c = count_contiguous_subclusters(bs, nb_clusters, sc_index,
> +                                         l2_slice, l2_index);
>          *cluster_offset = 0;
>          break;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type()
  2019-11-04 12:35   ` Max Reitz
@ 2019-11-04 15:01     ` Max Reitz
  0 siblings, 0 replies; 65+ messages in thread
From: Max Reitz @ 2019-11-04 15:01 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2182 bytes --]

On 04.11.19 13:35, Max Reitz wrote:
> On 26.10.19 23:25, Alberto Garcia wrote:
>> This function returns the type of an individual subcluster. If an
>> image does not have subclusters then this returns the exact same value
>> as qcow2_get_cluster_type().
>>
>> The information in standard and extended L2 entries is encoded in a
>> slightly different way, but all existing QCow2ClusterType values are
>> also valid for subclusters and have the same meanings (although they
>> typically only apply to the requested subcluster).
>>
>> There are two important exceptions to this:
>>
>>   a) QCOW2_CLUSTER_COMPRESSED means that the whole cluster is
>>      compressed. We do not support compression at the subcluster
>>      level.
>>
>>   b) QCOW2_CLUSTER_UNALLOCATED means that the cluster is unallocated,
>>      that is, the offset field of the L2 entry does not point to a
>>      host cluster. All subclusters are obviously unallocated too but
>>      any of them could be of type QCOW2_CLUSTER_ZERO_PLAIN.
>>
>> In addition to that, extended L2 entries allow one new scenario where
>> the cluster is normally allocated but an individual subcluster is not.
>> This is very different from (b) and because of that this patch adds a
>> new value called QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER.
>>
>> As a last thing, this patch adds QCOW2_CLUSTER_INVALID to detect the
>> cases where an L2 entry has a value that violates the spec. The caller
>> is responsible for handling these situations.
>>
>> To prevent compatibility problems with images that have invalid values
>> but are currently being read by QEMU without causing side effects,
>> QCOW2_CLUSTER_INVALID is only returned for images with extended L2
>> entries.
>>
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>  block/qcow2.h | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 62 insertions(+)
> 
> [...]
> 
>> +        case QCOW2_CLUSTER_ZERO_PLAIN:
>> +        case QCOW2_CLUSTER_ZERO_ALLOC:
>> +            return QCOW2_CLUSTER_INVALID;
> 
> The spec doesn’t say anything about this, though.

It does, I just forgot. O:-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice()
  2019-10-26 21:25 ` [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
@ 2019-11-04 15:04   ` Max Reitz
  2019-11-04 15:10     ` Max Reitz
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 15:04 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2003 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
> image has subclusters. Instead, the individual 'all zeroes' bits must
> be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e67559152f..3e4ba8d448 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1852,7 +1852,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>      assert(nb_clusters <= INT_MAX);
>  
>      for (i = 0; i < nb_clusters; i++) {
> -        uint64_t old_offset;
> +        uint64_t old_offset, l2_entry = 0;
>          QCow2ClusterType cluster_type;
>  
>          old_offset = get_l2_entry(s, l2_slice, l2_index + i);
> @@ -1869,12 +1869,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>  
>          qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>          if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
> -            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>              qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);

It feels wrong to me to free the cluster before updating the L2 entry.

Max

>          } else {
> -            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
> -            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
> +            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
>          }
> +
> +        if (has_subclusters(s)) {
> +            set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES);
> +        } else {
> +            l2_entry |= QCOW_OFLAG_ZERO;
> +        }
> +
> +        set_l2_entry(s, l2_slice, l2_index + i, l2_entry);
>      }
>  
>      qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()
  2019-10-26 21:25 ` [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
@ 2019-11-04 15:07   ` Max Reitz
  2019-11-14 15:33     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 15:07 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1292 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
> image has subclusters. Instead, the individual 'all zeroes' bits must
> be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 3e4ba8d448..aa3eb727a5 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1772,7 +1772,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>  
>          /* First remove L2 entries */
>          qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> -        if (!full_discard && s->qcow_version >= 3) {
> +        if (has_subclusters(s)) {
> +            set_l2_entry(s, l2_slice, l2_index + i, 0);
> +            set_l2_bitmap(s, l2_slice, l2_index + i,
> +                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);

But only for !full_discard, right?

Max

> +        } else if (!full_discard && s->qcow_version >= 3) {
>              set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>          } else {
>              set_l2_entry(s, l2_slice, l2_index + i, 0);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice()
  2019-11-04 15:04   ` Max Reitz
@ 2019-11-04 15:10     ` Max Reitz
  2019-11-14 15:31       ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-04 15:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1544 bytes --]

On 04.11.19 16:04, Max Reitz wrote:
> On 26.10.19 23:25, Alberto Garcia wrote:
>> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>> image has subclusters. Instead, the individual 'all zeroes' bits must
>> be used.
>>
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>  block/qcow2-cluster.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index e67559152f..3e4ba8d448 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1852,7 +1852,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>      assert(nb_clusters <= INT_MAX);
>>  
>>      for (i = 0; i < nb_clusters; i++) {
>> -        uint64_t old_offset;
>> +        uint64_t old_offset, l2_entry = 0;
>>          QCow2ClusterType cluster_type;
>>  
>>          old_offset = get_l2_entry(s, l2_slice, l2_index + i);
>> @@ -1869,12 +1869,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>  
>>          qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>          if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
>> -            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>>              qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
> 
> It feels wrong to me to free the cluster before updating the L2 entry.

(Although it’s pre-existing, as set_l2_entry() is just an in-cache
operation anyway :-/)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2019-10-26 21:25 ` [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
@ 2019-11-05 11:05   ` Max Reitz
  2019-11-14 15:43     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-05 11:05 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1545 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Two changes are needed in order to add subcluster support to this
> function: deallocated clusters must have their bitmaps cleared, and
> expanded clusters must have all the "subcluster allocated" bits set.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index aa3eb727a5..62f2a9fcc0 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -2036,6 +2036,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>                          /* not backed; therefore we can simply deallocate the
>                           * cluster */
>                          set_l2_entry(s, l2_slice, j, 0);
> +                        set_l2_bitmap(s, l2_slice, j, 0);
>                          l2_dirty = true;
>                          continue;
>                      }
> @@ -2102,6 +2103,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>                  } else {
>                      set_l2_entry(s, l2_slice, j, offset);
>                  }
> +                set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
>                  l2_dirty = true;
>              }

Technically this isn’t needed because this function is only called when
downgrading v3 to v2 images (which can’t have subclusters), but of
course it won’t hurt.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2019-10-26 21:25 ` [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
@ 2019-11-05 11:43   ` Max Reitz
  2019-11-14 16:30     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-05 11:43 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2280 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> The L2 bitmap needs to be updated after each write to indicate what
> new subclusters are now allocated.
> 
> This needs to happen even if the cluster was already allocated and the
> L2 entry was otherwise valid.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index fb6cf8df17..acb7226e03 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -980,6 +980,22 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>  
>          set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
>                       (cluster_offset + (i << s->cluster_bits)));
> +
> +        /* Update bitmap with the subclusters that were just written */
> +        if (has_subclusters(s)) {
> +            uint64_t written_from = m->cow_start.offset;
> +            uint64_t written_to = m->cow_end.offset + m->cow_end.nb_bytes;

It’s a bit strange to make these uint64_t when all the fields queried
are of type unsigned, but more on that below.

> +            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
> +            int sc;
> +            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
> +                uint64_t sc_off = i * s->cluster_size + sc * s->subcluster_size;

It’s weird to give this a uint64_t type when all the variables in the
term are of type int.

I’m not sure whether it can overflow.  handle_alloc() limits everything
to INT_MAX, but I’m not sure about handle_copied().

Speaking of handle_copied(); both elements of Qcow2COWRegion are of type
unsigned.  handle_copied() doesn’t look like it takes any precautions to
limit the range to even UINT_MAX (and it should probably limit it to
INT_MAX).

Max

> +                if (sc_off >= written_from && sc_off < written_to) {
> +                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
> +                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
> +                }
> +            }
> +            set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
> +        }
>       }
>  
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 22/26] qcow2: Add subcluster support to handle_alloc_space()
  2019-10-26 21:25 ` [RFC PATCH v2 22/26] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
@ 2019-11-05 12:05   ` Max Reitz
  0 siblings, 0 replies; 65+ messages in thread
From: Max Reitz @ 2019-11-05 12:05 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1470 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> The bdrv_co_pwrite_zeroes() call here fills complete clusters with
> zeroes, but it can happen that some subclusters are not part of the
> write request or the copy-on-write. This patch makes sure that only
> the affected subclusters are overwritten.
> 
> A potential improvement would be to also fill with zeroes the other
> subclusters if we can guarantee that we are not overwriting existing
> data. However this would waste more disk space, so we should first
> evaluate if it's really worth doing.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 0261e87709..01322ca449 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2304,6 +2304,9 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>  
>      for (m = l2meta; m != NULL; m = m->next) {
>          int ret;
> +        uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
> +        uint64_t nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
> +            m->cow_start.offset;

It might be more honest to make nb_bytes an unsigned.  (There shouldn’t
be any overflows here, because the total size is limited to INT64_MAX by
handle_alloc().)

Max

>          if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
>              continue;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-10-26 21:25 ` [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2019-11-05 12:47   ` Max Reitz
  2019-11-15 13:35     ` Alberto Garcia
  0 siblings, 1 reply; 65+ messages in thread
From: Max Reitz @ 2019-11-05 12:47 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 6478 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Now that the implementation of subclusters is complete we can finally
> add the necessary options to create and read images with this feature,
> which we call "extended L2 entries".
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c                    |  46 ++++++++++++++
>  block/qcow2.h                    |   8 ++-
>  include/block/block_int.h        |   1 +
>  qapi/block-core.json             |   6 ++
>  tests/qemu-iotests/031.out       |   8 +--
>  tests/qemu-iotests/036.out       |   4 +-
>  tests/qemu-iotests/049.out       | 102 +++++++++++++++----------------
>  tests/qemu-iotests/060.out       |   1 +
>  tests/qemu-iotests/061.out       |  20 +++---
>  tests/qemu-iotests/065           |  18 ++++--
>  tests/qemu-iotests/082.out       |  48 ++++++++++++---
>  tests/qemu-iotests/085.out       |  38 ++++++------
>  tests/qemu-iotests/144.out       |   4 +-
>  tests/qemu-iotests/182.out       |   2 +-
>  tests/qemu-iotests/185.out       |   8 +--
>  tests/qemu-iotests/198.out       |   2 +
>  tests/qemu-iotests/206.out       |   4 ++
>  tests/qemu-iotests/242.out       |   5 ++
>  tests/qemu-iotests/255.out       |   8 +--
>  tests/qemu-iotests/common.filter |   1 +
>  20 files changed, 224 insertions(+), 110 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 537569ce88..b1fa7ab5da 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1347,6 +1347,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>      s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
>      s->subcluster_bits = ctz32(s->subcluster_size);
>  
> +    if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
> +        error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size);
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
>      /* Check support for various header values */
>      if (header.refcount_order > 6) {
>          error_setg(errp, "Reference count entry width too large; may not "

It feels like this hunk should be part of the patch that added the
s->subcluster_size assignment.

> @@ -2806,6 +2812,11 @@ int qcow2_update_header(BlockDriverState *bs)
>                  .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
>                  .name = "lazy refcounts",
>              },
> +            {
> +                .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
> +                .bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
> +                .name = "extended L2 entries",
> +            },
>          };
>  
>          ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
> @@ -3271,6 +3282,27 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
>          goto out;
>      }
>  
> +    if (!qcow2_opts->has_extended_l2) {
> +        qcow2_opts->extended_l2 = false;
> +    }
> +    if (qcow2_opts->extended_l2) {
> +        unsigned min_cluster_size =
> +            (1 << MIN_CLUSTER_BITS) * QCOW_MAX_SUBCLUSTERS_PER_CLUSTER;
> +        if (version < 3) {
> +            error_setg(errp, "Extended L2 entries are only supported with "
> +                       "compatibility level 1.1 and above (use version=v3 or "

Pre-existing, but old-style creation doesn’t allow version=v3.  I
suppose someone(tm) should fix that...

> +                       "greater)");
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        if (cluster_size < min_cluster_size) {
> +            error_setg(errp, "Extended L2 entries are only supported with "
> +                       "cluster sizes of at least %u bytes", min_cluster_size);
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +    }
> +
>      if (!qcow2_opts->has_preallocation) {
>          qcow2_opts->preallocation = PREALLOC_MODE_OFF;
>      }

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index aa97ee2641..6e4c81ae1c 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -66,6 +66,8 @@
>  #                 standalone (read-only) raw image without looking at qcow2
>  #                 metadata (since: 4.0)
>  #
> +# @extended-l2: true if the image has extended L2 entries (since 4.2)
> +#

I think there should be the same note here as for lazy-refcounts.  So
this field is not present for compat=0.10, and present otherwise.
(Without that note, it looks as if this field maybe is only present if
it’s true.)

(Personally, I’d also prefer it to be an “iff”, but who cares.)

>  # @lazy-refcounts: on or off; only valid for compat >= 1.1
>  #
>  # @corrupt: true if the image has been marked corrupt; only valid for
> @@ -85,6 +87,7 @@
>        'compat': 'str',
>        '*data-file': 'str',
>        '*data-file-raw': 'bool',
> +      '*extended-l2': 'bool',
>        '*lazy-refcounts': 'bool',
>        '*corrupt': 'bool',
>        'refcount-bits': 'int',
> @@ -4372,6 +4375,8 @@
>  # @data-file-raw    True if the external data file must stay valid as a
>  #                   standalone (read-only) raw image without looking at qcow2
>  #                   metadata (default: false; since: 4.0)
> +# @extended-l2      True if the image has extended L2 entries

s/if the image has/to make the image have/ (or something like it)

> +#                   (default: false; since 4.2)
>  # @size             Size of the virtual disk in bytes
>  # @version          Compatibility level (default: v3)
>  # @backing-file     File name of the backing file if a backing file

[...]

> diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
> index 9f418b4881..45c0fe746b 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -137,6 +137,7 @@ _filter_img_create()
>          -e "s# adapter_type=[^ ]*##g" \
>          -e "s# hwversion=[^ ]*##g" \
>          -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
> +        -e "s# extended_l2=\\(on\\|off\\)##g" \
>          -e "s# block_size=[0-9]\\+##g" \
>          -e "s# block_state_zero=\\(on\\|off\\)##g" \
>          -e "s# log_size=[0-9]\\+##g" \

Filtering it seems like a good idea so one could run the iotests with -o
extended_l2=on.  There are still a ton of tests that don’t really seem
filtered, though.  Maybe my “Allow ./check -o data_file” series improves
that? :?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 25/26] qcow2: Allow preallocation and backing files if extended_l2 is set
  2019-10-26 21:25 ` [RFC PATCH v2 25/26] qcow2: Allow preallocation and backing files if extended_l2 is set Alberto Garcia
@ 2019-11-05 13:11   ` Max Reitz
  0 siblings, 0 replies; 65+ messages in thread
From: Max Reitz @ 2019-11-05 13:11 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1581 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Traditional qcow2 images don't allow preallocation if a backing file
> is set. This is because once a cluster is allocated there is no way to
> tell that its data should be read from the backing file.
> 
> Extended L2 entries have individual allocation bits for each
> subcluster, and therefore it is perfectly possible to have an
> allocated cluster with all its subclusters unallocated.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c              | 7 ++++---
>  tests/qemu-iotests/206.out | 2 +-
>  2 files changed, 5 insertions(+), 4 deletions(-)

But it doesn’t work, because qcow2_alloc_cluster_offset() always
allocates the whole cluster, so the backing file content isn’t visible:

$ ./qemu-img create -f qcow2 base.qcow2 64M
Formatting 'base.qcow2', fmt=qcow2 size=67108864 cluster_size=65536
lazy_refcounts=off extended_l2=off refcount_bits=16

$ ./qemu-io -c 'write -P 42 0 64M' base.qcow2
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 00.21 sec (307.344 MiB/sec and 4.8022 ops/sec)

$ ./qemu-img create -f qcow2 -o preallocation=metadata,extended_l2=on \
    top.qcow2
Formatting 'top.qcow2', fmt=qcow2 size=67108864 backing_file=base.qcow2
cluster_size=65536 preallocation=metadata lazy_refcounts=off
extended_l2=on refcount_bits=16

$ ./qemu-io -c 'read -P 42 0 64M' top.qcow2
Pattern verification failed at offset 0, 67108864 bytes
read 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 00.03 sec (2.498 GiB/sec and 39.9725 ops/sec)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 00/26] Add subcluster allocation to qcow2
  2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (25 preceding siblings ...)
  2019-10-26 21:25 ` [RFC PATCH v2 26/26] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
@ 2019-11-05 13:32 ` Max Reitz
  26 siblings, 0 replies; 65+ messages in thread
From: Max Reitz @ 2019-11-05 13:32 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 2778 bytes --]

On 26.10.19 23:25, Alberto Garcia wrote:
> Hi,
> 
> here's the new version of the patches to add subcluster allocation
> support to qcow2.
> 
> Please refer to the cover letter of the first version for a full
> description of the patches:
> 
>    https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html
> 
> This version includes a few tests, but I'm planning to add more for
> the next revision.

I think what would help most with testing is if it were possible to
simply run the iotests with -o extended_l2=on.

In general, the RFC looks OK to me.  The one thing I dislike is that I
feel that it is a bit, well, uncourageous.  Now, after looking at the
series, I don’t know whether you really changed everything that needs to
be changed so it can deal with subclusters.

To me it feels like this is because you tried to keep everything as it
is and only do minimal changes.  That is usually a good thing, but here
I don’t know, because this way we can’t simply grep for places that need
fixing (because they use /\<cluster/ instead of /subcluster/).

To me it feels like with subclusters, the whole design should be
different.  Note that the following is just a very naïve idea, but
anyway: I feel like what we need to separate isn’t L2 entries vs.
clusters vs. subclusters (so a separation based on, well, syntax?) but a
separation based on offset vs. allocation status (so a separation based
on semantics).

So I imagine there would be one function that sets a whole cluster’s
(i.e., a group of subclusters) allocation offset; and another function
that sets individual subclusters’ allocation status.

Reversely, there should be a function to query a cluster’s/subcluster’s
allocation offset, and another to query a subcluster’s type.

To me it looks like the places where
QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER is handled separately from
QCOW2_CLUSTER_UNALLOCATED are places where we’re really not interested
in the subcluster’s type at all, but just whether there’s an allocation
or not.  This is the case in qcow2_get_cluster_offset(),
calculate_l2_meta(), and qcow2_co_block_status().

(These places should then use the function to query the allocation
offset and evaluate the result instead of querying the subcluster type.)


Does that sound in any way acceptable to you?  You have more insight
into this now and so maybe you know already that it can’t work.
(Or maybe it’s just too invasive.)


Right now, all I can do is grep for QCOW2_CLUSTER_ and set_l2_entry().
And all of those places look OK to me.  But I just can’t shake off the
uneasy feeling of not being able to really know whether this series
really got to all the places that need adjustment.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  2019-11-04 13:10       ` Max Reitz
@ 2019-11-07 14:44         ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-07 14:44 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Mon 04 Nov 2019 02:10:37 PM CET, Max Reitz wrote:

   [QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER]
> I still don’t know what you’re doing in the later patches, but to me
> it looks a bit like you don’t dare breaking up the existing structure
> that just deals with clusters.

Yeah, I decided to extend the existing type to make the changes less
invasive, but I'm also leaning towards creating a different type
now. I'll give it a try.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta()
  2019-11-04 14:21   ` Max Reitz
@ 2019-11-08 15:18     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-08 15:18 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Mon 04 Nov 2019 03:21:41 PM CET, Max Reitz wrote:
>> If an image has subclusters then there are more copy-on-write
>> scenarios that we need to consider. Let's say we have a write request
>> from the middle of subcluster #3 until the end of the cluster:
>> 
>>    - If the cluster is new, then subclusters #0 to #3 from the old
>>      cluster must be copied into the new one.
>
> You mean for snapshots?
>
> (That isn’t quite clear, and I only guess this based on the next
> bullet point which differentiates based on “the old cluster was
> unallocated”.  That’s weird, too, because what does that mean, old
> cluster and new cluster?

Yes, perhaps the terminology is a bit unclear.

When I say "new cluster" is this context I mean that a write request
requires that a new cluster is allocated in the qcow2 file.

Then the "old cluster" would be what was there before the write (i.e. a
cluster with refcount > 1 or an unallocated cluster). Where we are doing
the copy-on-write from.

>>   * If @keep_old is true it means that the clusters were already
>>   * allocated and will be overwritten. If false then the clusters are
>>   * new and we have to decrease the reference count of the old ones.
>> + *
>> + * Returns 1 on success, -errno on failure.
>
> I think there should be a note here on why this doesn’t follow the
> general 0/-errno schema (i.e., “, because that is what callers generally
> expect”).

Good idea.

>> +    if (!keep_old) {
>> +        switch (type) {
>> +        case QCOW2_CLUSTER_NORMAL:
>> +        case QCOW2_CLUSTER_COMPRESSED:
>> +        case QCOW2_CLUSTER_ZERO_ALLOC:
>> +        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
>> +            cow_start_from = 0;
>
> Somehow (I don’t know why) I find this a bit tough to understand.
>
> Wouldn’t it work to let cow_start start from the first subcluster for
> ZERO_ALLOC and UNALLOCATED_SUBCLUSTER?  We don’t need to COW those, it
> should be sufficient to just make the subclusters before that zero or
> unallocated, respectively.

Here's one good example why I should probably add a QCow2SubclusterType
different from the existing QCow2ClusterType.

In this context, 'type' is the type of the subcluster, and because of
that _ZERO_ALLOC means that the subcluster reads as zeros but the
cluster itself is allocated. Other subcluster may contain data and
that's why we have to copy all of them.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset()
  2019-11-04 14:58   ` Max Reitz
@ 2019-11-08 15:42     ` Alberto Garcia
  2019-11-11  8:42       ` Max Reitz
  0 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-11-08 15:42 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Mon 04 Nov 2019 03:58:57 PM CET, Max Reitz wrote:
> OTOH, what I don’t like so far about this series is that the “cluster
> logic” is still everywhere when I think it should just be about
> subclusters now.  (Except in few places where it must be about
> clusters as in something that can have a distinct host offset and/or
> has an own L2 entry.)  So maybe the parameter should really be
> @nb_subclusters.

> But I’m not sure.  For how this function is written right now, it
> makes sense for it to be @nb_clusters, but I think it could be changed
> so it would work with @nb_subclusters, too.

I'm still reviewing your (much appreciated) feedback, but one thing I
can tell you is that my initial versions were doing everything with
subclusters because of the reasons you mention (i.e. there was
@nb_subclusters and all that).

Later when I started to tidy things up I realized that most of those
places only needed the number of clusters after all, and in some cases
the necessary changes were really minimal (like in handle_copied()).

>> +static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
>> +                                        unsigned sc_index, uint64_t *l2_slice,
>> +                                        int l2_index)
>>  {
   /* ... */
>> +    if (type == QCOW2_CLUSTER_COMPRESSED) {
>> +        return 1; /* Compressed clusters are always counted one by one */
>
> Hm, yes, but cluster by cluster, not subcluster by subcluster, so this
> should be s->subclusters_per_cluster, and perhaps sc_index should be
> asserted to be 0.  (Or it should be s->subclusters_per_cluster -
> sc_index.)

Right, that's a bug, it forces the caller to decompress the cluster 32
times in order to read it completely! Thanks!

(in reality this is not used because this function doesn't get called
for compressed clusters but the same problem happens in the calling
function, as you correctly point out. Maybe I should assert here
instead)

>> @@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>
> I suppose this is get_subcluster_offset now.

Hmmm no, this returns the actual host cluster offset, then the caller
uses offset_into_cluster() to get the final value (which doesn't need to
be subcluster-aligned anyway).

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset()
  2019-11-08 15:42     ` Alberto Garcia
@ 2019-11-11  8:42       ` Max Reitz
  0 siblings, 0 replies; 65+ messages in thread
From: Max Reitz @ 2019-11-11  8:42 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 1493 bytes --]

On 08.11.19 16:42, Alberto Garcia wrote:
> On Mon 04 Nov 2019 03:58:57 PM CET, Max Reitz wrote:

[...]

>>> @@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>>
>> I suppose this is get_subcluster_offset now.
> 
> Hmmm no, this returns the actual host cluster offset, then the caller
> uses offset_into_cluster() to get the final value (which doesn't need to
> be subcluster-aligned anyway).

It still returns the subcluster type, though.  Which makes the whole
thing kind of a weird chimera...  Maybe it would work better if the
subcluster type would be stored in a pointer parameter, so it’d be clear?

But does it have actually have any advantages to on one hand return a
cluster-aligned host offset and on the other return a bytes count that
starts at the mid-cluster offset?  That’s weird already.  Maybe the
offset that’s returned should just not be cluster-aligned and the whole
function be called qcow2_get_host_offset().

I suppose one reason (maybe the only one?) to return aligned offsets is
for compressed clusters.  It would be another kind of magic to return
aligned offsets only for them.  But maybe it’s worth it?  (Judging from
a quick glance, it doesn’t look too difficult to me to modify the
callers to accept a mid-cluster host offset.)

(Another thing I just noticed is that the comment above
qcow2_get_cluster_offset() needs some fixing, because it still refers to
whole clusters.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-10-30 14:24   ` Max Reitz
@ 2019-11-13 14:25     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-13 14:25 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Wed 30 Oct 2019 03:24:08 PM CET, Max Reitz wrote:
>>  static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>>                                uint64_t guest_offset, uint64_t bytes,
>> -                              QCowL2Meta **m, bool keep_old)
>> +                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
>>  {
>>      BDRVQcow2State *s = bs->opaque;
>> -    unsigned cow_start_from = 0;
>> +    int l2_index = offset_to_l2_slice_index(s, guest_offset);
>> +    uint64_t l2_entry;
>> +    unsigned cow_start_from, cow_end_to;
>>      unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>>      unsigned cow_end_from = cow_start_to + bytes;
>> -    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>>      unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>>      QCowL2Meta *old_m = *m;
>> +    QCow2ClusterType type;
>> +
>> +    /* Return if there's no COW (all clusters are normal and we keep them) */
>> +    if (keep_old) {
>> +        int i;
>> +        for (i = 0; i < nb_clusters; i++) {
>> +            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
>
> I’d assert somewhere that l2_index + nb_clusters - 1 won’t overflow.
>
>> +            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
>
> Wouldn’t cluster_needs_cow() be better?

The semantics of cluster_needs_cow() change in this patch (which also
updates the documentation). But I should maybe change the name instead.

>> +                break;
>> +            }
>> +        }
>> +        if (i == nb_clusters) {
>> +            return;
>> +        }
>> +    }
>
> So I understand we always need to create an L2Meta structure in all
> other cases because we at least need to turn those clusters into
> normal clusters?  (Even if they’re already allocated, as in the case
> of allocated zero clusters.)

That's correct.

>> -/* Returns true if writing to a cluster requires COW */
>> +/* Returns true if the cluster is unallocated or has refcount > 1 */
>>  static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
>>  {
>>      switch (qcow2_get_cluster_type(bs, l2_entry)) {
>>      case QCOW2_CLUSTER_NORMAL:
>> +    case QCOW2_CLUSTER_ZERO_ALLOC:
>>          if (l2_entry & QCOW_OFLAG_COPIED) {
>>              return false;
>
> Don’t zero-allocated clusters need COW always?  (Because the at the
> given host offset isn’t guaranteed to be zero.)

Yeah, hence the semantics change I described earlier. I should probably
call it cluster_needs_new_allocation() or something like that, which is
what this means now ("true if unallocated or refcount > 1").

>> - * Returns the number of contiguous clusters that can be used for an allocating
>> - * write, but require COW to be performed (this includes yet unallocated space,
>> - * which must copy from the backing file)
>> + * Returns the number of contiguous clusters that can be written to
>> + * using one single write request, starting from @l2_index.
>> + * At most @nb_clusters are checked.
>> + *
>> + * If @want_cow is true this counts clusters that are either
>> + * unallocated, or allocated but with refcount > 1.
>
> +(So they need to be newly allocated and COWed)?

Yes. Which in this context is the same as "newly allocated" I guess,
because every newly allocated cluster requires COW.

> (Or is the past participle of COW COWn?  Or maybe CedOW?)

:-))

>> + * If @want_cow is false this counts clusters that are already
>> + * allocated and can be written to using their current locations
>
> s/using their current locations/in-place/?

Ok.

>> @@ -1475,13 +1489,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>>      *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>>      assert(*bytes != 0);
>>  
>> -    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
>> -                      m, keep_old_clusters);
>> +    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
>> +                      m, false);
>>  
>> -    return 1;
>> +    ret = 1;
>>  
>> -fail:
>> -    if (*m && (*m)->nb_clusters > 0) {
>> +out:
>> +    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
>
> Is this a bug fix?

No, we call qcow2_cache_put() later because calculate_l2_meta() needs
l2_slice.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2019-10-30 16:55   ` Max Reitz
@ 2019-11-14 13:57     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-14 13:57 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Wed 30 Oct 2019 05:55:04 PM CET, Max Reitz wrote:
>> This patch also adds the get/set_l2_bitmap() functions that are used
>> to access the bitmaps. For convenience, these functions are no-ops
>> when used in traditional qcow2 images.
>
> Granted, I haven’t seen the following patches yet, but if these
> functions are indeed called for images that don’t have subclusters,
> shouldn’t they return 0x0*0f*f then? (i.e. everything allocated)
>
> If they aren’t, they should probably just abort().  Well,
> set_l2_bitmap() should probably always abort() if there aren’t any
> subclusters.

Yeah, set_l2_bitmap() should abort (I had this changed already).

About get_l2_bitmap() ... I decided not to abort for convenience, for
cases like this one:

    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
    uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);

Here the value of l2_bitmap is going to be ignored anyway so it doesn't
matter what we return, but perhaps for consistency we should return
QCOW_OFLAG_SUB_ALLOC(0), which means that the first (and only, in this
case) subcluster is allocated.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice()
  2019-11-04 15:10     ` Max Reitz
@ 2019-11-14 15:31       ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-14 15:31 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Mon 04 Nov 2019 04:10:58 PM CET, Max Reitz wrote:
>>>          qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>>          if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
>>> -            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>>>              qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>> 
>> It feels wrong to me to free the cluster before updating the L2
>> entry.
>
> (Although it’s pre-existing, as set_l2_entry() is just an in-cache
> operation anyway :-/)

Yes, I think that if you want to do it afterwards you need to add
another loop after the qcow2_cache_put() call.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()
  2019-11-04 15:07   ` Max Reitz
@ 2019-11-14 15:33     ` Alberto Garcia
  2019-11-14 16:22       ` Max Reitz
  0 siblings, 1 reply; 65+ messages in thread
From: Alberto Garcia @ 2019-11-14 15:33 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Mon 04 Nov 2019 04:07:35 PM CET, Max Reitz wrote:
>>          /* First remove L2 entries */
>>          qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>> -        if (!full_discard && s->qcow_version >= 3) {
>> +        if (has_subclusters(s)) {
>> +            set_l2_entry(s, l2_slice, l2_index + i, 0);
>> +            set_l2_bitmap(s, l2_slice, l2_index + i,
>> +                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
>
> But only for !full_discard, right?

Hence the conditional operator in my patch, maybe you didn't see it?

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2019-11-05 11:05   ` Max Reitz
@ 2019-11-14 15:43     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-14 15:43 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Tue 05 Nov 2019 12:05:02 PM CET, Max Reitz wrote:
>> @@ -2102,6 +2103,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>>                  } else {
>>                      set_l2_entry(s, l2_slice, j, offset);
>>                  }
>> +                set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
>>                  l2_dirty = true;
>>              }
>
> Technically this isn’t needed because this function is only called
> when downgrading v3 to v2 images (which can’t have subclusters), but
> of course it won’t hurt.

Right, but we need to change the function anyway to either do this or
assert that there are no subclusters. I prefer to do this because it's
trivial but I won't oppose if someone prefers the alternative.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice()
  2019-11-14 15:33     ` Alberto Garcia
@ 2019-11-14 16:22       ` Max Reitz
  0 siblings, 0 replies; 65+ messages in thread
From: Max Reitz @ 2019-11-14 16:22 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

[-- Attachment #1.1: Type: text/plain, Size: 698 bytes --]

On 14.11.19 16:33, Alberto Garcia wrote:
> On Mon 04 Nov 2019 04:07:35 PM CET, Max Reitz wrote:
>>>          /* First remove L2 entries */
>>>          qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>> -        if (!full_discard && s->qcow_version >= 3) {
>>> +        if (has_subclusters(s)) {
>>> +            set_l2_entry(s, l2_slice, l2_index + i, 0);
>>> +            set_l2_bitmap(s, l2_slice, l2_index + i,
>>> +                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
>>
>> But only for !full_discard, right?
> 
> Hence the conditional operator in my patch, maybe you didn't see it?

Oops, yep, I was just hung up on the if conditional.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2019-11-05 11:43   ` Max Reitz
@ 2019-11-14 16:30     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-14 16:30 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Tue 05 Nov 2019 12:43:16 PM CET, Max Reitz wrote:

> Speaking of handle_copied(); both elements of Qcow2COWRegion are of
> type unsigned.  handle_copied() doesn’t look like it takes any
> precautions to limit the range to even UINT_MAX (and it should
> probably limit it to INT_MAX).

Or rather, both handle_copied() and handle_alloc() should probably limit
it to BDRV_REQUEST_MAX_BYTES.

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-11-05 12:47   ` Max Reitz
@ 2019-11-15 13:35     ` Alberto Garcia
  0 siblings, 0 replies; 65+ messages in thread
From: Alberto Garcia @ 2019-11-15 13:35 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Tue 05 Nov 2019 01:47:58 PM CET, Max Reitz wrote:
>> @@ -1347,6 +1347,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>>      s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
>>      s->subcluster_bits = ctz32(s->subcluster_size);
>>  
>> +    if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
>> +        error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size);
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>>      /* Check support for various header values */
>>      if (header.refcount_order > 6) {
>>          error_setg(errp, "Reference count entry width too large; may not "
>
> It feels like this hunk should be part of the patch that added the
> s->subcluster_size assignment.

I don't have a strong opinion, but that patch simply defines the basic
attributes that are used elsewhere in the code, and this is the patch
where the values are checked (for creation and opening).

Berto


^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, back to index

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-26 21:25 [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 01/26] qcow2: Add calculate_l2_meta() Alberto Garcia
2019-10-28 12:50   ` Vladimir Sementsov-Ogievskiy
2019-10-30 15:56     ` Alberto Garcia
2019-10-30 12:04   ` Max Reitz
2019-10-30 16:02     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 02/26] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
2019-10-28 13:55   ` Vladimir Sementsov-Ogievskiy
2019-10-26 21:25 ` [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
2019-10-30 14:24   ` Max Reitz
2019-11-13 14:25     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 04/26] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 05/26] qcow2: Document the Extended L2 Entries feature Alberto Garcia
2019-10-30 16:23   ` Max Reitz
2019-10-30 22:38     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 06/26] qcow2: Add dummy has_subclusters() function Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 07/26] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 08/26] qcow2: Add offset_to_sc_index() Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 09/26] qcow2: Add l2_entry_size() Alberto Garcia
2019-10-30 16:47   ` Max Reitz
2019-10-26 21:25 ` [RFC PATCH v2 10/26] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
2019-10-30 16:55   ` Max Reitz
2019-11-14 13:57     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 11/26] qcow2: Add qcow2_get_subcluster_type() Alberto Garcia
2019-11-04 12:35   ` Max Reitz
2019-11-04 15:01     ` Max Reitz
2019-10-26 21:25 ` [RFC PATCH v2 12/26] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER Alberto Garcia
2019-11-04 12:57   ` Max Reitz
2019-11-04 13:03     ` Alberto Garcia
2019-11-04 13:10       ` Max Reitz
2019-11-07 14:44         ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 13/26] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
2019-11-04 14:21   ` Max Reitz
2019-11-08 15:18     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 14/26] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
2019-11-04 14:58   ` Max Reitz
2019-11-08 15:42     ` Alberto Garcia
2019-11-11  8:42       ` Max Reitz
2019-10-26 21:25 ` [RFC PATCH v2 15/26] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
2019-11-04 15:04   ` Max Reitz
2019-11-04 15:10     ` Max Reitz
2019-11-14 15:31       ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 16/26] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
2019-11-04 15:07   ` Max Reitz
2019-11-14 15:33     ` Alberto Garcia
2019-11-14 16:22       ` Max Reitz
2019-10-26 21:25 ` [RFC PATCH v2 17/26] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 18/26] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
2019-11-05 11:05   ` Max Reitz
2019-11-14 15:43     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 19/26] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 20/26] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
2019-11-05 11:43   ` Max Reitz
2019-11-14 16:30     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 21/26] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 22/26] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
2019-11-05 12:05   ` Max Reitz
2019-10-26 21:25 ` [RFC PATCH v2 23/26] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 24/26] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
2019-11-05 12:47   ` Max Reitz
2019-11-15 13:35     ` Alberto Garcia
2019-10-26 21:25 ` [RFC PATCH v2 25/26] qcow2: Allow preallocation and backing files if extended_l2 is set Alberto Garcia
2019-11-05 13:11   ` Max Reitz
2019-10-26 21:25 ` [RFC PATCH v2 26/26] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
2019-11-05 13:32 ` [RFC PATCH v2 00/26] Add subcluster allocation to qcow2 Max Reitz

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git