All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/23] Add subcluster allocation to qcow2
@ 2019-10-15 15:23 Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 01/23] qcow2: Add calculate_l2_meta() Alberto Garcia
                   ` (24 more replies)
  0 siblings, 25 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Hi,

this series adds a new feature to the qcow2 on-disk format called
"Extended L2 Entries", which allows us to do subcluster allocation.

This cover letter explains the reasons behind this proposal, the
changes to the on-disk format, test results and pending work. If you
are curious you can also have a look at previous discussions about
this feature:

   https://lists.gnu.org/archive/html/qemu-block/2017-04/msg00178.html
   https://lists.gnu.org/archive/html/qemu-block/2019-06/msg01155.html

This is the first proper version of the patches, and I believe that
the implementation is complete. However since I'm proposing a change
to the on-disk format I'm labeling this as RFC because I'm expecting
some debate. I'll remove the RFC tag and add more tests in future
revisions.

=== Problem ===

A qcow2 image is divided into units of constant size called clusters,
and among other things it contains metadata that maps guest addresses
to host addresses (the so-called L1 and L2 tables).

There are two basic problems that result from this:

1) Reading from or writing to a qcow2 image involves reading the
   corresponding entry on the L2 table that maps the guest address to
   the host address. This is very slow because it involves two I/O
   operations: one on the L2 table and the other one on the actual
   data cluster.

2) A cluster is the smallest unit of allocation. Therefore writing a
   mere 512 bytes to an empty disk requires allocating a complete
   cluster and filling it with zeroes (or with data from the backing
   image if there is one). This wastes more disk space and also has a
   negative impact on I/O.

Problem (1) can be solved by caching the L2 tables in memory. The
maximum amount of disk space used by L2 tables depends on the virtual
disk size and the cluster size:

   max_l2_size = virtual_disk_size * 8 / cluster_size

Because of this, the only way to reduce the size of the L2 tables is
by increasing the cluster size (which can be any power of two between
512 bytes and 2 MB). But then we hit problem (2): I/O is slower and
more disk space is wasted.

=== The proposal ===

The proposal is to extend the qcow2 format by allowing subcluster
allocation. The on-disk format remains essentially the same, except
that each data cluster is internally divided into 32 subclusters of
equal size.

The way it works in practice is with a new optional feature called
"Extended L2 Entries", that needs to be enabled when an image is
created. With this, each entry on an L2 table is accompanied by a
bitmap indicating the allocation state of each one of the subclusters
for that cluster. The size of an L2 entry doubles from 64 to 128 bits.

Other than L2 entries, all other data structures remain unchanged, but
for data clusters the smallest unit of allocation is now the
subcluster. Reference counting is still at the cluster level, because
there is no way to reference individual subclusters. Copy-on-write on
internal snapshots needs to copy complete clusters, so that scenario
would not benefit from this change.

I see two main use cases for this feature:

a) The qcow2 image is not too large / the L2 cache is not a problem,
   but you want to increase the allocation performance. In this case
   you can have a 128KB cluster with 4KB subclusters (with 4KB being a
   common block size in ext4 and other filesystems)

b) The qcow2 image is very large and you want to save metadata space
   in order to have a smaller L2 cache. In this case you can go for
   the maximum cluster size (2MB) but you want to have smaller
   subclusters to increase the allocation performance and optimize the
   disk usage.

=== Changes to the on-disk format ===

An L2 entry is 64 bits wide, with this format (for uncompressed
clusters):

63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
**<----> <--------------------------------------------------><------->*
  Rsrved              host cluster offset of data             Reserved
  (6 bits)                (47 bits)                           (8 bits)

    bit 63: refcount == 1   (QCOW_OFLAG_COPIED)
    bit 62: compressed = 1  (QCOW_OFLAG_COMPRESSED)
    bit  0: all zeros       (QCOW_OFLAG_ZERO)

If Extended L2 Entries are enabled, bit 0 becomes reserved and must be
unset, and this 64-bit bitmap follows the entry:

63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<---------------------------------> <--------------------------------->
     subcluster reads as zeros            subcluster is allocated
             (32 bits)                           (32 bits)

All this applies to uncompressed clusters. Compressed clusters are not
divided into subclusters, the cluster descriptor remains exactly the
same, and the 64-bit bitmap is not used (i.e. all bits are always 0).

=== Test results ===

I made all tests on an SSD drive, writing to an empty qcow2 image with
a fully populated 40GB backing image, performing random writes using
fio with a block size of 4KB. I ran the tests with all available
cluster sizes starting from 4KB.

It's important to point out that once a cluster has been completely
allocated then having subclusters offers no performance benefit. For
this reason the size of the image for these tests (40GB) was chosen to
be large enough to guarantee that there are always new clusters being
allocated. This is therefore a worst-case scenario (or best-case for
this feature, if you want).

Subcluster sizes are in brackets:

|-----------------+----------------+-----------------|
|   Cluster size  | subclusters=on | subclusters=off |
|-----------------+----------------+-----------------|
|    4 KB ( N/A ) |            N/A |         95 IOPS |
|    8 KB ( N/A ) |            N/A |        599 IOPS |
|   16 KB (512 B) |      4129 IOPS |       3597 IOPS |
|   32 KB  (1 KB) |     11255 IOPS |       2642 IOPS |
|   64 KB  (2 KB) |     13341 IOPS |       1671 IOPS |
|  128 KB  (4 KB) |     12391 IOPS |        870 IOPS |
|  256 KB  (8 KB) |      9645 IOPS |        566 IOPS |
|  512 KB (16 KB) |      4960 IOPS |        359 IOPS |
| 1024 KB (32 KB) |      2732 IOPS |        215 IOPS |
| 2048 KB (64 KB) |      1630 IOPS |        214 IOPS |
|-----------------+----------------+-----------------|

Here are the same tests, but without any backing image:

|-----------------+----------------+-----------------|
|   Cluster size  | subclusters=on | subclusters=off |
|-----------------+----------------+-----------------|
|    4 KB ( N/A ) |            N/A |         93 IOPS |
|    8 KB ( N/A ) |            N/A |        539 IOPS |
|   16 KB (512 B) |      4174 IOPS |       7598 IOPS |
|   32 KB  (1 KB) |     11326 IOPS |      11957 IOPS |
|   64 KB  (2 KB) |     13516 IOPS |      13375 IOPS |
|  128 KB  (4 KB) |     12435 IOPS |      13274 IOPS |
|  256 KB  (8 KB) |     12071 IOPS |      14174 IOPS |
|  512 KB (16 KB) |     12169 IOPS |      14343 IOPS |
| 1024 KB (32 KB) |     12307 IOPS |      14622 IOPS |
| 2048 KB (64 KB) |     12784 IOPS |      14574 IOPS |
|-----------------+----------------+-----------------|

Some comments about the results:

- The smallest allowed cluster size for an image with subclusters is
  16 KB (in this case the subclusters size is 512 bytes), hence the
  missing values in the 4 KB and 8 KB rows.

- In images with a backing file: allocation is much faster when
  subclusters are enabled. As expected, images with a cluster size of
  64KB perform similar to images with a subcluster size of 64KB. When
  there is no copy-on-write involved (subcluster size <= 4KB) then the
  maximum performance is achieved.

- In images without a backing file: Since commit c8bb23cbdb when empty
  clusters are allocated for the first time they are filled with
  zeroes using an efficient method (typically fallocate() with
  FALLOC_FL_ZERO_RANGE). This is so fast that having subclusters here
  is actually a bit slower in most cases (although it still saves disk
  space).

- The 16 KB cluster / 512 byte subcluster case is quite slow.
  I haven't debugged this but I suspect that this is because new
  clusters need to be allocated all the time, and also L2 and refcount
  tables are very small and need to grow all the time. The same pattern
  can be seen in images without subclusters.

=== To do ===

A couple of things are missing from this series:

- The ability to efficiently zero individual subclusters using
  qcow2_co_pwrite_zeroes(). At the moment only full clusters can be
  zeroed with this method.

- Alternatively we could get rid of the individual "all zeroes" bits
  altogether and have 64 subclusters per cluster. We would still have
  the QCOW_OFLAG_ZERO bit in the standard cluster descriptor.

- The number of subclusters per cluster is always 32. It would be
  trivial to allow configuring this, but I don't see any use case.

- Tests: I have a few written that I'll add in future revisions of
  this series.

- handle_alloc_space() works at the subclusters level. That is, if you
  have an unallocated 2MB cluster with 64KB subclusters, no backing
  image and you write 4KB of data, QEMU won't write zeroes to the
  affected subcluster(s) and will use handle_alloc_space() instead.
  The other subclusters won't be touched and will remain unallocated.
  This behavior is consistent with how subclusters work and saves disk
  space, but offers slightly lower performance (see test results
  above). Theoretically we could offer a setting to configure this,
  but I'm not convinced that this is very useful.

===========================

As usual, feedback is welcome,

Berto

Alberto Garcia (23):
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: Add offset_to_sc_index()
  qcow2: Add l2_entry_size()
  qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  qcow2: Add qcow2_get_subcluster_type()
  qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  qcow2: Add subcluster support to calculate_l2_meta()
  qcow2: Add subcluster support to qcow2_get_cluster_offset()
  qcow2: Add subcluster support to zero_in_l2_slice()
  qcow2: Add subcluster support to discard_in_l2_slice()
  qcow2: Add subcluster support to check_refcounts_l2()
  qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  qcow2: Fix offset calculation in handle_dependencies()
  qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  qcow2: Add subcluster support to handle_alloc_space()
  qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit

 block/qcow2-cluster.c            | 547 ++++++++++++++++++++-----------
 block/qcow2-refcount.c           |  38 ++-
 block/qcow2.c                    |  83 ++++-
 block/qcow2.h                    | 121 ++++++-
 docs/interop/qcow2.txt           |  68 +++-
 docs/qcow2-cache.txt             |  19 +-
 include/block/block_int.h        |   1 +
 qapi/block-core.json             |   2 +
 tests/qemu-iotests/031.out       |   8 +-
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 +++---
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  20 +-
 tests/qemu-iotests/065           |  18 +-
 tests/qemu-iotests/082.out       |  48 ++-
 tests/qemu-iotests/085.out       |  38 +--
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +-
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   4 +
 tests/qemu-iotests/242.out       |   5 +
 tests/qemu-iotests/255.out       |   8 +-
 tests/qemu-iotests/common.filter |   1 +
 24 files changed, 817 insertions(+), 335 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH 01/23] qcow2: Add calculate_l2_meta()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 02/23] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 76 +++++++++++++++++++++++++++++--------------
 1 file changed, 52 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8d5fa1539c..fe2523ed66 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1019,6 +1019,55 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
                         QCOW2_DISCARD_NEVER);
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure and
+ * add it to @m.
+ *
+ * @host_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
+                              uint64_t guest_offset, uint64_t bytes,
+                              QCowL2Meta **m, bool keep_old)
+{
+    BDRVQcow2State *s = bs->opaque;
+    unsigned cow_start_from = 0;
+    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+    unsigned cow_end_from = cow_start_to + bytes;
+    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+    QCowL2Meta *old_m = *m;
+
+    *m = g_malloc0(sizeof(**m));
+    **m = (QCowL2Meta) {
+        .next           = old_m,
+
+        .alloc_offset   = host_offset,
+        .offset         = start_of_cluster(s, guest_offset),
+        .nb_clusters    = nb_clusters,
+
+        .keep_old_clusters = keep_old,
+
+        .cow_start = {
+            .offset     = cow_start_from,
+            .nb_bytes   = cow_start_to - cow_start_from,
+        },
+        .cow_end = {
+            .offset     = cow_end_from,
+            .nb_bytes   = cow_end_to - cow_end_from,
+        },
+    };
+
+    qemu_co_queue_init(&(*m)->dependent_requests);
+    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1414,35 +1463,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
     int avail_bytes = MIN(INT_MAX, nb_clusters << s->cluster_bits);
     int nb_bytes = MIN(requested_bytes, avail_bytes);
-    QCowL2Meta *old_m = *m;
-
-    *m = g_malloc0(sizeof(**m));
-
-    **m = (QCowL2Meta) {
-        .next           = old_m,
-
-        .alloc_offset   = alloc_cluster_offset,
-        .offset         = start_of_cluster(s, guest_offset),
-        .nb_clusters    = nb_clusters,
-
-        .keep_old_clusters  = keep_old_clusters,
-
-        .cow_start = {
-            .offset     = 0,
-            .nb_bytes   = offset_into_cluster(s, guest_offset),
-        },
-        .cow_end = {
-            .offset     = nb_bytes,
-            .nb_bytes   = avail_bytes - nb_bytes,
-        },
-    };
-    qemu_co_queue_init(&(*m)->dependent_requests);
-    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 
     *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                      m, keep_old_clusters);
+
     return 1;
 
 fail:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 02/23] qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 01/23] qcow2: Add calculate_l2_meta() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 03/23] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

We are going to need it in other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fe2523ed66..f462e169c0 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1068,6 +1068,24 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+    switch (qcow2_get_cluster_type(bs, l2_entry)) {
+    case QCOW2_CLUSTER_NORMAL:
+        if (l2_entry & QCOW_OFLAG_COPIED) {
+            return false;
+        }
+    case QCOW2_CLUSTER_UNALLOCATED:
+    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_CLUSTER_ZERO_PLAIN:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        return true;
+    default:
+        abort();
+    }
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1080,25 +1098,11 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
 
     for (i = 0; i < nb_clusters; i++) {
         uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-        switch(cluster_type) {
-        case QCOW2_CLUSTER_NORMAL:
-            if (l2_entry & QCOW_OFLAG_COPIED) {
-                goto out;
-            }
+        if (!cluster_needs_cow(bs, l2_entry)) {
             break;
-        case QCOW2_CLUSTER_UNALLOCATED:
-        case QCOW2_CLUSTER_COMPRESSED:
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
-            break;
-        default:
-            abort();
         }
     }
 
-out:
     assert(i <= nb_clusters);
     return i;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 03/23] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 01/23] qcow2: Add calculate_l2_meta() Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 02/23] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 04/23] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
     allocated and have a reference count of 1. In those clusters we
     can simply write the data and there is no need to perform any
     copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
     either because they haven't been allocated yet, because their
     reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
     which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
     reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
     allocation (all other cases).

One importante difference after this change is that clusters found in
handle_copied() may now require copy-on-write, but this will be anyway
necessary once we add support for subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 177 +++++++++++++++++++++++-------------------
 1 file changed, 96 insertions(+), 81 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f462e169c0..70b2e32f7e 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1021,7 +1021,8 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure and
- * add it to @m.
+ * add it to @m. If the write request does not need copy-on-write or
+ * changes to the L2 metadata then this function does nothing.
  *
  * @host_offset points to the beginning of the first cluster.
  *
@@ -1034,15 +1035,51 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
  */
 static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
                               uint64_t guest_offset, uint64_t bytes,
-                              QCowL2Meta **m, bool keep_old)
+                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned cow_start_from = 0;
+    int l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry;
+    unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
-    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
     unsigned nb_clusters = size_to_clusters(s, cow_end_from);
     QCowL2Meta *old_m = *m;
+    QCow2ClusterType type;
+
+    /* Return if there's no COW (all clusters are normal and we keep them) */
+    if (keep_old) {
+        int i;
+        for (i = 0; i < nb_clusters; i++) {
+            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+                break;
+            }
+        }
+        if (i == nb_clusters) {
+            return;
+        }
+    }
+
+    /* Get the L2 entry from the first cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_start_from = cow_start_to;
+    } else {
+        cow_start_from = 0;
+    }
+
+    /* Get the L2 entry from the last cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_end_to = cow_end_from;
+    } else {
+        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    }
 
     *m = g_malloc0(sizeof(**m));
     **m = (QCowL2Meta) {
@@ -1068,18 +1105,18 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
-/* Returns true if writing to a cluster requires COW */
+/* Returns true if the cluster is unallocated or has refcount > 1 */
 static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
 {
     switch (qcow2_get_cluster_type(bs, l2_entry)) {
     case QCOW2_CLUSTER_NORMAL:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
         if (l2_entry & QCOW_OFLAG_COPIED) {
             return false;
         }
     case QCOW2_CLUSTER_UNALLOCATED:
     case QCOW2_CLUSTER_COMPRESSED:
     case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_ZERO_ALLOC:
         return true;
     default:
         abort();
@@ -1087,20 +1124,34 @@ static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
 }
 
 /*
- * Returns the number of contiguous clusters that can be used for an allocating
- * write, but require COW to be performed (this includes yet unallocated space,
- * which must copy from the backing file)
+ * Returns the number of contiguous clusters that can be written to
+ * using one single write request, starting from @l2_index.
+ * At most @nb_clusters are checked.
+ *
+ * If @want_cow is true this counts clusters that are either
+ * unallocated, or allocated but with refcount > 1.
+ *
+ * If @want_cow is false this counts clusters that are already
+ * allocated and can be written to using their current locations
+ * (including QCOW2_CLUSTER_ZERO_ALLOC).
  */
 static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
-    uint64_t *l2_slice, int l2_index)
+                              uint64_t *l2_slice, int l2_index, bool want_cow)
 {
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        if (!cluster_needs_cow(bs, l2_entry)) {
+        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        if (cluster_needs_cow(bs, l2_entry) != want_cow) {
             break;
         }
+        if (!want_cow && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+            break;
+        }
+        expected_offset += s->cluster_size;
     }
 
     assert(i <= nb_clusters);
@@ -1228,18 +1279,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
 
     cluster_offset = be64_to_cpu(l2_slice[l2_index]);
 
-    /* Check how many clusters are already allocated and don't need COW */
-    if (qcow2_get_cluster_type(bs, cluster_offset) == QCOW2_CLUSTER_NORMAL
-        && (cluster_offset & QCOW_OFLAG_COPIED))
-    {
+    if (!cluster_needs_cow(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
         bool offset_matches =
             (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
 
         if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
+            qcow2_signal_corruption(bs, true, -1, -1, "%s cluster offset "
                                     "%#llx unaligned (guest offset: %#" PRIx64
-                                    ")", cluster_offset & L2E_OFFSET_MASK,
+                                    ")", cluster_offset & QCOW_OFLAG_ZERO ?
+                                    "Preallocated zero" : "Data",
+                                    cluster_offset & L2E_OFFSET_MASK,
                                     guest_offset);
             ret = -EIO;
             goto out;
@@ -1252,15 +1302,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         }
 
         /* We keep all QCOW_OFLAG_COPIED clusters */
-        keep_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index],
-                                      QCOW_OFLAG_COPIED | QCOW_OFLAG_ZERO);
+        keep_clusters = count_cow_clusters(bs, nb_clusters, l2_slice,
+                                           l2_index, false);
         assert(keep_clusters <= nb_clusters);
 
         *bytes = MIN(*bytes,
                  keep_clusters * s->cluster_size
                  - offset_into_cluster(s, guest_offset));
+        assert(*bytes != 0);
+
+        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
+                          *bytes, l2_slice, m, true);
 
         ret = 1;
     } else {
@@ -1361,12 +1413,10 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     BDRVQcow2State *s = bs->opaque;
     int l2_index;
     uint64_t *l2_slice;
-    uint64_t entry;
     uint64_t nb_clusters;
     int ret;
-    bool keep_old_clusters = false;
 
-    uint64_t alloc_cluster_offset = INV_OFFSET;
+    uint64_t alloc_cluster_offset;
 
     trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
                              *bytes);
@@ -1389,67 +1439,31 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    entry = be64_to_cpu(l2_slice[l2_index]);
-    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index);
+    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index, true);
 
     /* This function is only called when there were no non-COW clusters, so if
      * we can't find any unallocated or COW clusters either, something is
      * wrong with our code. */
     assert(nb_clusters > 0);
 
-    if (qcow2_get_cluster_type(bs, entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
-        (entry & QCOW_OFLAG_COPIED) &&
-        (*host_offset == INV_OFFSET ||
-         start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
-    {
-        int preallocated_nb_clusters;
-
-        if (offset_into_cluster(s, entry & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Preallocated zero "
-                                    "cluster offset %#llx unaligned (guest "
-                                    "offset: %#" PRIx64 ")",
-                                    entry & L2E_OFFSET_MASK, guest_offset);
-            ret = -EIO;
-            goto fail;
-        }
-
-        /* Try to reuse preallocated zero clusters; contiguous normal clusters
-         * would be fine, too, but count_cow_clusters() above has limited
-         * nb_clusters already to a range of COW clusters */
-        preallocated_nb_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_COPIED);
-        assert(preallocated_nb_clusters > 0);
-
-        nb_clusters = preallocated_nb_clusters;
-        alloc_cluster_offset = entry & L2E_OFFSET_MASK;
-
-        /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
-         * should not free them. */
-        keep_old_clusters = true;
+    /* Allocate, if necessary at a given offset in the image file */
+    alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
+        start_of_cluster(s, *host_offset);
+    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
+                                  &nb_clusters);
+    if (ret < 0) {
+        goto out;
     }
 
-    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-
-    if (alloc_cluster_offset == INV_OFFSET) {
-        /* Allocate, if necessary at a given offset in the image file */
-        alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
-                               start_of_cluster(s, *host_offset);
-        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
-                                      &nb_clusters);
-        if (ret < 0) {
-            goto fail;
-        }
-
-        /* Can't extend contiguous allocation */
-        if (nb_clusters == 0) {
-            *bytes = 0;
-            return 0;
-        }
-
-        assert(alloc_cluster_offset != INV_OFFSET);
+    /* Can't extend contiguous allocation */
+    if (nb_clusters == 0) {
+        *bytes = 0;
+        ret = 0;
+        goto out;
     }
 
+    assert(alloc_cluster_offset != INV_OFFSET);
+
     /*
      * Save info needed for meta data update.
      *
@@ -1472,13 +1486,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
-                      m, keep_old_clusters);
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
+                      m, false);
 
-    return 1;
+    ret = 1;
 
-fail:
-    if (*m && (*m)->nb_clusters > 0) {
+out:
+    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
+    if (ret < 0 && *m && (*m)->nb_clusters > 0) {
         QLIST_REMOVE(*m, next_in_flight);
     }
     return ret;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 04/23] qcow2: Add get_l2_entry() and set_l2_entry()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (2 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 03/23] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 05/23] qcow2: Document the Extended L2 Entries feature Alberto Garcia
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c  | 65 ++++++++++++++++++++++--------------------
 block/qcow2-refcount.c | 17 +++++------
 block/qcow2.h          | 12 ++++++++
 3 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 70b2e32f7e..b2045d51bf 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -379,12 +379,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
     QCow2ClusterType first_cluster_type;
     uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t offset = first_entry & mask;
 
     first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -397,7 +398,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
            first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
         if (offset + (uint64_t) i * cluster_size != l2_entry) {
             break;
         }
@@ -413,14 +414,16 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
                                                  int nb_clusters,
                                                  uint64_t *l2_slice,
+                                                 int l2_index,
                                                  QCow2ClusterType wanted_type)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
 
     assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
            wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = be64_to_cpu(l2_slice[i]);
+        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
         QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
         if (type != wanted_type) {
@@ -566,7 +569,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
-    *cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    *cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -601,14 +604,14 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
         c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  &l2_slice[l2_index], type);
+                                                  l2_slice, l2_index, type);
         *cluster_offset = 0;
         break;
     case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
         /* how many allocated clusters ? */
         c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_ZERO);
+                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
         *cluster_offset &= L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *cluster_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -761,7 +764,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     /* Compression can't overwrite anything. Fail if the cluster was already
      * allocated. */
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
     if (cluster_offset & L2E_OFFSET_MASK) {
         qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
         return -EIO;
@@ -786,7 +789,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-    l2_slice[l2_index] = cpu_to_be64(cluster_offset);
+    set_l2_entry(s, l2_slice, l2_index, cluster_offset);
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
@@ -978,12 +981,12 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
          * cluster the second one has to do RMW (which is done above by
          * perform_cow()), update l2 table with its cluster pointer and free
          * old cluster. This is what this loop does */
-        if (l2_slice[l2_index + i] != 0) {
-            old_cluster[j++] = l2_slice[l2_index + i];
+        if (get_l2_entry(s, l2_slice, l2_index + i) != 0) {
+            old_cluster[j++] = get_l2_entry(s, l2_slice, l2_index + i);
         }
 
-        l2_slice[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+        set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
+                     (cluster_offset + (i << s->cluster_bits)));
      }
 
 
@@ -997,8 +1000,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      */
     if (!m->keep_old_clusters && j != 0) {
         for (i = 0; i < j; i++) {
-            qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1,
-                                    QCOW2_DISCARD_NEVER);
+            qcow2_free_any_clusters(bs, old_cluster[i], 1, QCOW2_DISCARD_NEVER);
         }
     }
 
@@ -1051,7 +1053,7 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     if (keep_old) {
         int i;
         for (i = 0; i < nb_clusters; i++) {
-            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
             if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
                 break;
             }
@@ -1062,7 +1064,7 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     }
 
     /* Get the L2 entry from the first cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1072,7 +1074,7 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     }
 
     /* Get the L2 entry from the last cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1139,12 +1141,12 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
                               uint64_t *l2_slice, int l2_index, bool want_cow)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         if (cluster_needs_cow(bs, l2_entry) != want_cow) {
             break;
         }
@@ -1277,7 +1279,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
     if (!cluster_needs_cow(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
@@ -1655,7 +1657,7 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     for (i = 0; i < nb_clusters; i++) {
         uint64_t old_l2_entry;
 
-        old_l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * If full_discard is false, make sure that a discarded area reads back
@@ -1695,9 +1697,9 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (!full_discard && s->qcow_version >= 3) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
-            l2_slice[l2_index + i] = cpu_to_be64(0);
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
         }
 
         /* Then decrease the refcount */
@@ -1777,7 +1779,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         uint64_t old_offset;
         QCow2ClusterType cluster_type;
 
-        old_offset = be64_to_cpu(l2_slice[l2_index + i]);
+        old_offset = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * Minimize L2 changes if the cluster already reads back as
@@ -1791,10 +1793,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            l2_slice[l2_index + i] |= cpu_to_be64(QCOW_OFLAG_ZERO);
+            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
+            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
         }
     }
 
@@ -1932,7 +1935,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
             }
 
             for (j = 0; j < s->l2_slice_size; j++) {
-                uint64_t l2_entry = be64_to_cpu(l2_slice[j]);
+                uint64_t l2_entry = get_l2_entry(s, l2_slice, j);
                 int64_t offset = l2_entry & L2E_OFFSET_MASK;
                 QCow2ClusterType cluster_type =
                     qcow2_get_cluster_type(bs, l2_entry);
@@ -1946,7 +1949,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                     if (!bs->backing) {
                         /* not backed; therefore we can simply deallocate the
                          * cluster */
-                        l2_slice[j] = 0;
+                        set_l2_entry(s, l2_slice, j, 0);
                         l2_dirty = true;
                         continue;
                     }
@@ -2009,9 +2012,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 }
 
                 if (l2_refcount == 1) {
-                    l2_slice[j] = cpu_to_be64(offset | QCOW_OFLAG_COPIED);
+                    set_l2_entry(s, l2_slice, j, offset | QCOW_OFLAG_COPIED);
                 } else {
-                    l2_slice[j] = cpu_to_be64(offset);
+                    set_l2_entry(s, l2_slice, j, offset);
                 }
                 l2_dirty = true;
             }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index ef965d7895..14f71df7da 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1309,7 +1309,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     uint64_t cluster_index;
                     uint64_t offset;
 
-                    entry = be64_to_cpu(l2_slice[j]);
+                    entry = get_l2_entry(s, l2_slice, j);
                     old_entry = entry;
                     entry &= ~QCOW_OFLAG_COPIED;
                     offset = entry & L2E_OFFSET_MASK;
@@ -1383,7 +1383,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                             qcow2_cache_set_dependency(bs, s->l2_table_cache,
                                                        s->refcount_block_cache);
                         }
-                        l2_slice[j] = cpu_to_be64(entry);
+                        set_l2_entry(s, l2_slice, j, entry);
                         qcow2_cache_entry_mark_dirty(s->l2_table_cache,
                                                      l2_slice);
                     }
@@ -1616,7 +1616,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
 
     /* Do the actual checks */
     for(i = 0; i < s->l2_size; i++) {
-        l2_entry = be64_to_cpu(l2_table[i]);
+        l2_entry = get_l2_entry(s, l2_table, i);
 
         switch (qcow2_get_cluster_type(bs, l2_entry)) {
         case QCOW2_CLUSTER_COMPRESSED:
@@ -1685,7 +1685,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
-                        l2_table[i] = cpu_to_be64(l2_entry);
+                        set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, sizeof(uint64_t), false);
                         if (ret < 0) {
@@ -1913,7 +1913,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         for (j = 0; j < s->l2_size; j++) {
-            uint64_t l2_entry = be64_to_cpu(l2_table[j]);
+            uint64_t l2_entry = get_l2_entry(s, l2_table, j);
             uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
             QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
 
@@ -1936,9 +1936,10 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                             "l2_entry=%" PRIx64 " refcount=%" PRIu64 "\n",
                             repair ? "Repairing" : "ERROR", l2_entry, refcount);
                     if (repair) {
-                        l2_table[j] = cpu_to_be64(refcount == 1
-                                    ? l2_entry |  QCOW_OFLAG_COPIED
-                                    : l2_entry & ~QCOW_OFLAG_COPIED);
+                        set_l2_entry(s, l2_table, j,
+                                     refcount == 1 ?
+                                     l2_entry |  QCOW_OFLAG_COPIED :
+                                     l2_entry & ~QCOW_OFLAG_COPIED);
                         l2_dirty++;
                     }
                 }
diff --git a/block/qcow2.h b/block/qcow2.h
index a488d761ff..0b68c55c01 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -485,6 +485,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                    int idx)
+{
+    return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                int idx, uint64_t entry)
+{
+    l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 05/23] qcow2: Document the Extended L2 Entries feature
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (3 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 04/23] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 06/23] qcow2: Add dummy has_subclusters() function Alberto Garcia
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
 docs/qcow2-cache.txt   | 19 +++++++++++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..d34261f955 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
                     as the maximum cluster size and won't be able to open images
                     with larger cluster sizes.
 
+                    Note: if the image has Extended L2 Entries then cluster_bits
+                    must be at least 14 (i.e. 16384 byte clusters).
+
          24 - 31:   size
                     Virtual disk size in bytes.
 
@@ -109,7 +112,12 @@ in the description of a field.
                                 An External Data File Name header extension may
                                 be present if this bit is set.
 
-                    Bits 3-63:  Reserved (set to 0)
+                    Bit 3:      Extended L2 Entries.  If this bit is set then
+                                L2 table entries use an extended format that
+                                allows subcluster-based allocation. See the
+                                Extended L2 Entries section for more details.
+
+                    Bits 4-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-    l2_entries = (cluster_size / sizeof(uint64_t))
+    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
 
     l2_index = (offset / cluster_size) % l2_entries
     l1_index = (offset / cluster_size) / l2_entries
@@ -447,6 +455,8 @@ obtained as follows:
 
     return cluster_offset + (offset % cluster_size)
 
+    [*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
     Bit  0 -  8:    Reserved (set to 0)
@@ -487,7 +497,8 @@ Standard Cluster Descriptor:
                     nor is data read from the backing file if the cluster is
                     unallocated.
 
-                    With version 2, this is always 0.
+                    With version 2 or with extended L2 entries (see the next
+                    section), this is always 0.
 
          1 -  8:    Reserved (set to 0)
 
@@ -524,6 +535,57 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated like in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+    Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+                    1: the subcluster is allocated. In this case the
+                       host cluster offset field must contain a valid
+                       offset.
+                    0: the subcluster is not allocated. In this case
+                       read requests shall go to the backing file or
+                       return zeros if there is no backing file data.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 31 - x)
+
+        32 -  63    Subcluster reads as zeros (one bit per subcluster)
+
+                    1: the subcluster reads as zeros. In this case the
+                       allocation status bit must be unset. The host
+                       cluster offset field may or may not be set.
+                    0: no effect.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 63 - x)
+
+Subcluster Allocation Bitmap (for compressed clusters):
+
+    Bit  0 -  63:   Reserved (set to 0)
+                    Compressed clusters don't have subclusters,
+                    so this field is not used.
 
 == Snapshots ==
 
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index d57f409861..04eb4ce2f1 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -1,6 +1,6 @@
 qcow2 L2/refcount cache configuration
 =====================================
-Copyright (C) 2015, 2018 Igalia, S.L.
+Copyright (C) 2015, 2018-2019 Igalia, S.L.
 Author: Alberto Garcia <berto@igalia.com>
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -222,3 +222,20 @@ support this functionality, and is 0 (disabled) on other platforms.
 This functionality currently relies on the MADV_DONTNEED argument for
 madvise() to actually free the memory. This is a Linux-specific feature,
 so cache-clean-interval is not supported on other systems.
+
+
+Extended L2 Entries
+-------------------
+All numbers shown in this document are valid for qcow2 images with normal
+64-bit L2 entries.
+
+Images with extended L2 entries need twice as much L2 metadata, so the L2
+cache size must be twice as large for the same disk space.
+
+   disk_size = l2_cache_size * cluster_size / 16
+
+i.e.
+
+   l2_cache_size = disk_size * 16 / cluster_size
+
+Refcount blocks are not affected by this.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 06/23] qcow2: Add dummy has_subclusters() function
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (4 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 05/23] qcow2: Document the Extended L2 Entries feature Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 07/23] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 0b68c55c01..6d6fc57f41 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -485,6 +485,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+    /* FIXME: Return false until this feature is complete */
+    return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 07/23] qcow2: Add subcluster-related fields to BDRVQcow2State
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (5 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 06/23] qcow2: Add dummy has_subclusters() function Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 08/23] qcow2: Add offset_to_sc_index() Alberto Garcia
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one,
with subcluster_size = cluster_size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 5 +++++
 block/qcow2.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4d16393e61..be9854c5ea 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1341,6 +1341,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         }
     }
 
+    s->subclusters_per_cluster =
+        has_subclusters(s) ? QCOW_MAX_SUBCLUSTERS_PER_CLUSTER : 1;
+    s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+    s->subcluster_bits = ctz32(s->subcluster_size);
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
diff --git a/block/qcow2.h b/block/qcow2.h
index 6d6fc57f41..e6486a2cf8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -72,6 +72,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -274,6 +276,9 @@ typedef struct BDRVQcow2State {
     int cluster_bits;
     int cluster_size;
     int l2_slice_size;
+    int subcluster_bits;
+    int subcluster_size;
+    int subclusters_per_cluster;
     int l2_bits;
     int l2_size;
     int l1_size;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 08/23] qcow2: Add offset_to_sc_index()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (6 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 07/23] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 09/23] qcow2: Add l2_entry_size() Alberto Garcia
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index e6486a2cf8..c450267c88 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -556,6 +556,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State *s, int64_t offset)
     return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+    return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
     return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 09/23] qcow2: Add l2_entry_size()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (7 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 08/23] qcow2: Add offset_to_sc_index() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 10/23] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c  | 12 ++++++------
 block/qcow2-refcount.c | 14 ++++++++------
 block/qcow2.c          |  6 +++---
 block/qcow2.h          |  5 +++++
 4 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index b2045d51bf..67f90e415d 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -209,7 +209,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
                    uint64_t l2_offset, uint64_t **l2_slice)
 {
     BDRVQcow2State *s = bs->opaque;
-    int start_of_slice = sizeof(uint64_t) *
+    int start_of_slice = l2_entry_size(s) *
         (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
     return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -277,7 +277,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new l2 entry */
 
-    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
     if (l2_offset < 0) {
         ret = l2_offset;
         goto fail;
@@ -301,7 +301,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new entry in the l2 cache */
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -365,7 +365,7 @@ fail:
     }
     s->l1_table[l1_index] = old_l2_offset;
     if (l2_offset > 0) {
-        qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+        qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                             QCOW2_DISCARD_ALWAYS);
     }
     return ret;
@@ -708,7 +708,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
 
         /* Then decrease the refcount of the old table */
         if (l2_offset) {
-            qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+            qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                                 QCOW2_DISCARD_OTHER);
         }
 
@@ -1880,7 +1880,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
     int ret;
     int i, j;
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 14f71df7da..a2c4d36378 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1253,7 +1253,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     l2_slice = NULL;
     l1_table = NULL;
     l1_size2 = l1_size * sizeof(uint64_t);
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     s->cache_discards = true;
@@ -1604,7 +1604,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
     int i, l2_size, nb_csectors, ret;
 
     /* Read L2 table from disk */
-    l2_size = s->l2_size * sizeof(uint64_t);
+    l2_size = s->l2_size * l2_entry_size(s);
     l2_table = g_malloc(l2_size);
 
     ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1679,15 +1679,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                             fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
                             offset);
                     if (fix & BDRV_FIX_ERRORS) {
+                        int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
                         uint64_t l2e_offset =
-                            l2_offset + (uint64_t)i * sizeof(uint64_t);
+                            l2_offset + (uint64_t)i * l2_entry_size(s);
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
                         set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
-                                l2e_offset, sizeof(uint64_t), false);
+                                l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Overlap check failed\n");
                             res->check_errors++;
@@ -1697,7 +1698,8 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         }
 
                         ret = bdrv_pwrite_sync(bs->file, l2e_offset,
-                                               &l2_table[i], sizeof(uint64_t));
+                                               &l2_table[idx],
+                                               l2_entry_size(s));
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Failed to overwrite L2 "
                                     "table entry: %s\n", strerror(-ret));
@@ -1904,7 +1906,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         ret = bdrv_pread(bs->file, l2_offset, l2_table,
-                         s->l2_size * sizeof(uint64_t));
+                         s->l2_size * l2_entry_size(s));
         if (ret < 0) {
             fprintf(stderr, "ERROR: Could not read L2 table: %s\n",
                     strerror(-ret));
diff --git a/block/qcow2.c b/block/qcow2.c
index be9854c5ea..131711d6fa 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -831,7 +831,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
     /* An L2 table is always one cluster in size so the max cache size
      * should be a multiple of the cluster size. */
-    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
+    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * l2_entry_size(s),
                                      s->cluster_size);
 
     combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
@@ -992,7 +992,7 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
         }
     }
 
-    r->l2_slice_size = l2_cache_entry_size / sizeof(uint64_t);
+    r->l2_slice_size = l2_cache_entry_size / l2_entry_size(s);
     r->l2_table_cache = qcow2_cache_create(bs, l2_cache_size,
                                            l2_cache_entry_size);
     r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size,
@@ -1386,7 +1386,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         bs->encrypted = true;
     }
 
-    s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    s->l2_bits = s->cluster_bits - ctz32(l2_entry_size(s));
     s->l2_size = 1 << s->l2_bits;
     /* 2^(s->refcount_order - 3) is the refcount width in bytes */
     s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
diff --git a/block/qcow2.h b/block/qcow2.h
index c450267c88..9a7648af47 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -496,6 +496,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
     return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+    return has_subclusters(s) ? sizeof(uint64_t) * 2 : sizeof(uint64_t);
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 10/23] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (8 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 09/23] qcow2: Add l2_entry_size() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 11/23] qcow2: Add qcow2_get_subcluster_type() Alberto Garcia
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are used
to access the bitmaps. For convenience, these functions are no-ops
when used in traditional qcow2 images.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9a7648af47..d9fe883fe0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -504,15 +504,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                     int idx)
+{
+    if (has_subclusters(s)) {
+        idx *= l2_entry_size(s) / sizeof(uint64_t);
+        return be64_to_cpu(l2_slice[idx + 1]);
+    } else {
+        return 0;
+    }
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                 int idx, uint64_t entry)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                 int idx, uint64_t bitmap)
+{
+    if (has_subclusters(s)) {
+        idx *= l2_entry_size(s) / sizeof(uint64_t);
+        l2_slice[idx + 1] = cpu_to_be64(bitmap);
+    }
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 11/23] qcow2: Add qcow2_get_subcluster_type()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (9 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 10/23] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 12/23] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER Alberto Garcia
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

This function returns the type of an individual subcluster. If an
image does not have subclusters then this returns the exact same value
as qcow2_get_cluster_type().

The information in standard and extended L2 entries is encoded in a
slightly different way, but all existing QCow2ClusterType values are
also valid for subclusters and have the same meanings (although they
typically only apply to the requested subcluster).

There are two important exceptions to this:

  a) QCOW2_CLUSTER_COMPRESSED means that the whole cluster is
     compressed. We do not support compression at the subcluster
     level.

  b) QCOW2_CLUSTER_UNALLOCATED means that the cluster is unallocated,
     that is, the offset field of the L2 entry does not point to a
     host cluster. All subclusters are obviously unallocated too but
     any of them could be of type QCOW2_CLUSTER_ZERO_PLAIN.

In addition to that, extended L2 entries allow one new scenario where
the cluster is normally allocated but an individual subcluster is not.
This is very different from (b) and because of that this patch adds a
new value called QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER.

As a last thing, this patch adds QCOW2_CLUSTER_INVALID to detect the
cases where an L2 entry has a value that violates the spec. The caller
is responsible for handling these situations.

To prevent compatibility problems with images that have invalid values
but are currently being read by QEMU without causing side effects,
QCOW2_CLUSTER_INVALID is only returned for images with extended L2
entries.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index d9fe883fe0..60e4bf963e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -74,6 +74,15 @@
 
 #define QCOW_MAX_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)    ((1ULL << 63) >> (X))
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   ((1ULL << 31) >> (X))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES 0xFFFFFFFF00000000ULL
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  0x00000000FFFFFFFFULL
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -435,10 +444,12 @@ typedef struct QCowL2Meta
 
 typedef enum QCow2ClusterType {
     QCOW2_CLUSTER_UNALLOCATED,
+    QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER,
     QCOW2_CLUSTER_ZERO_PLAIN,
     QCOW2_CLUSTER_ZERO_ALLOC,
     QCOW2_CLUSTER_NORMAL,
     QCOW2_CLUSTER_COMPRESSED,
+    QCOW2_CLUSTER_INVALID,
 } QCow2ClusterType;
 
 typedef enum QCow2MetadataOverlap {
@@ -618,6 +629,57 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs,
     }
 }
 
+/* In an image without subsclusters this returns the same value as
+ * qcow2_get_cluster_type() */
+static inline int qcow2_get_subcluster_type(BlockDriverState *bs,
+                                            uint64_t l2_entry,
+                                            uint64_t l2_bitmap,
+                                            unsigned sc_index)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
+    assert(sc_index < s->subclusters_per_cluster);
+
+    if (has_subclusters(s)) {
+        bool sc_zero  = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
+        bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
+        switch (type) {
+        case QCOW2_CLUSTER_COMPRESSED:
+            if (l2_bitmap != 0) {
+                return QCOW2_CLUSTER_INVALID;
+            }
+            break;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+            return QCOW2_CLUSTER_INVALID;
+        case QCOW2_CLUSTER_NORMAL:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_CLUSTER_NORMAL;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_ZERO_ALLOC;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_CLUSTER_INVALID;
+            }
+        case QCOW2_CLUSTER_UNALLOCATED:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_UNALLOCATED;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_CLUSTER_INVALID;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_CLUSTER_ZERO_PLAIN;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_CLUSTER_INVALID;
+            }
+        default:
+            g_assert_not_reached();
+        }
+    }
+
+    return type;
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 12/23] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (10 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 11/23] qcow2: Add qcow2_get_subcluster_type() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 13/23] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

In the previous patch we added a new QCow2ClusterType named
QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER. There is a couple of places
where this new value needs to be handled, and that is what this patch
does.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 131711d6fa..c222cd261d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1922,8 +1922,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
 
     *pnum = bytes;
 
-    if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
-        !s->crypto) {
+    if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC ||
+         ret == QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) && !s->crypto) {
         index_in_cluster = offset & (s->cluster_size - 1);
         *map = cluster_offset | index_in_cluster;
         *file = s->data_file->bs;
@@ -1931,7 +1931,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     }
     if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
-    } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+    } else if (ret != QCOW2_CLUSTER_UNALLOCATED &&
+               ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER) {
         status |= BDRV_BLOCK_DATA;
     }
     if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2009,6 +2010,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
 
         switch (ret) {
         case QCOW2_CLUSTER_UNALLOCATED:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
 
             if (bs->backing) {
                 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -3542,6 +3544,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         nr = s->cluster_size;
         ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
         if (ret != QCOW2_CLUSTER_UNALLOCATED &&
+            ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER &&
             ret != QCOW2_CLUSTER_ZERO_PLAIN &&
             ret != QCOW2_CLUSTER_ZERO_ALLOC) {
             qemu_co_mutex_unlock(&s->lock);
@@ -3612,6 +3615,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
         switch (ret) {
         case QCOW2_CLUSTER_UNALLOCATED:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 13/23] qcow2: Add subcluster support to calculate_l2_meta()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (11 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 12/23] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 14/23] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

   - If the cluster is new, then subclusters #0 to #3 from the old
     cluster must be copied into the new one.

   - If the cluster is new but the old cluster was unallocated, then
     only subcluster #3 needs copy-on-write. #0 to #2 are marked as
     unallocated in the bitmap of the new L2 entry.

   - If we are overwriting an old cluster and subcluster #3 is
     unallocated or has the all-zeroes bit set then we need
     copy-on-write on subcluster #3.

   - If we are overwriting an old cluster and subcluster #3 was
     allocated then there is no need to copy-on-write.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 136 +++++++++++++++++++++++++++++++++---------
 1 file changed, 108 insertions(+), 28 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 67f90e415d..8df0f67316 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1034,14 +1034,16 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 1 on success, -errno on failure.
  */
-static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
-                              uint64_t guest_offset, uint64_t bytes,
-                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
+                             uint64_t guest_offset, uint64_t bytes,
+                             uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    int l2_index = offset_to_l2_slice_index(s, guest_offset);
-    uint64_t l2_entry;
+    int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry, l2_bitmap;
     unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
@@ -1049,38 +1051,108 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
     QCowL2Meta *old_m = *m;
     QCow2ClusterType type;
 
-    /* Return if there's no COW (all clusters are normal and we keep them) */
+    /* Return if there's no COW (all subclusters are normal and we are
+     * keeping the clusters) */
     if (keep_old) {
+        unsigned first_sc = cow_start_to / s->subcluster_size;
+        unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
         int i;
-        for (i = 0; i < nb_clusters; i++) {
-            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+        for (i = first_sc; i <= last_sc; i++) {
+            unsigned c = i / s->subclusters_per_cluster;
+            unsigned sc = i % s->subclusters_per_cluster;
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+            l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+            type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+            if (type == QCOW2_CLUSTER_INVALID) {
+                l2_index += c; /* Point to the invalid entry */
+                goto fail;
+            }
+            if (type != QCOW2_CLUSTER_NORMAL) {
                 break;
             }
         }
-        if (i == nb_clusters) {
-            return;
+        if (i == last_sc + 1) {
+            return 1;
         }
     }
 
     /* Get the L2 entry from the first cluster */
     l2_entry = get_l2_entry(s, l2_slice, l2_index);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_start_from = cow_start_to;
+    if (type == QCOW2_CLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+        case QCOW2_CLUSTER_COMPRESSED:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_start_from = 0;
+            break;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_UNALLOCATED:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_start_from = 0;
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+            cow_start_from = cow_start_to;
+            break;
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     /* Get the L2 entry from the last cluster */
-    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_index += nb_clusters - 1;
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset + bytes - 1);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_end_to = cow_end_from;
+    if (type == QCOW2_CLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+        case QCOW2_CLUSTER_COMPRESSED:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+            break;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_UNALLOCATED:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+        switch (type) {
+        case QCOW2_CLUSTER_NORMAL:
+            cow_end_to = cow_end_from;
+            break;
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     *m = g_malloc0(sizeof(**m));
@@ -1105,6 +1177,18 @@ static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
 
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+
+fail:
+    if (type == QCOW2_CLUSTER_INVALID) {
+        uint64_t l1_index = offset_to_l1_index(s, guest_offset);
+        uint64_t l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        return -EIO;
+    }
+
+    return 1;
 }
 
 /* Returns true if the cluster is unallocated or has refcount > 1 */
@@ -1313,10 +1397,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
                  - offset_into_cluster(s, guest_offset));
         assert(*bytes != 0);
 
-        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
-                          *bytes, l2_slice, m, true);
-
-        ret = 1;
+        ret = calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK,
+                                guest_offset, *bytes, l2_slice, m, true);
     } else {
         ret = 0;
     }
@@ -1488,10 +1570,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
-                      m, false);
-
-    ret = 1;
+    ret = calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                            l2_slice, m, false);
 
 out:
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 14/23] qcow2: Add subcluster support to qcow2_get_cluster_offset()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (12 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 13/23] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 15/23] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 111 ++++++++++++++++++++----------------------
 1 file changed, 52 insertions(+), 59 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8df0f67316..71d4cc518a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -372,66 +372,51 @@ fail:
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. At most @nb_clusters are checked. Allocated clusters are
+ * also required to be contiguous in the image file.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+                                        unsigned sc_index, uint64_t *l2_slice,
+                                        int l2_index)
 {
     BDRVQcow2State *s = bs->opaque;
-    int i;
-    QCow2ClusterType first_cluster_type;
-    uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-    uint64_t offset = first_entry & mask;
-
-    first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-    if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-        return 0;
+    int i, j, count = 0;
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
+    bool check_offset = true;
+    QCow2ClusterType type =
+        qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+
+    assert(type != QCOW2_CLUSTER_INVALID); /* The caller should check this */
+
+    if (type == QCOW2_CLUSTER_COMPRESSED) {
+        return 1; /* Compressed clusters are always counted one by one */
     }
 
-    /* must be allocated */
-    assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-           first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
-
-    for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-        if (offset + (uint64_t) i * cluster_size != l2_entry) {
-            break;
-        }
+    if (type == QCOW2_CLUSTER_UNALLOCATED || type == QCOW2_CLUSTER_ZERO_PLAIN) {
+        check_offset = false;
     }
 
-        return i;
-}
-
-/*
- * Checks how many consecutive unallocated clusters in a given L2
- * slice have the same cluster type.
- */
-static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
-                                                 int nb_clusters,
-                                                 uint64_t *l2_slice,
-                                                 int l2_index,
-                                                 QCow2ClusterType wanted_type)
-{
-    BDRVQcow2State *s = bs->opaque;
-    int i;
-
-    assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-           wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-        QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
-
-        if (type != wanted_type) {
-            break;
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+        l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+        if (check_offset && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+            goto out;
+        }
+        for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) {
+            if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) {
+                goto out;
+            }
+            count++;
         }
+        expected_offset += s->cluster_size;
     }
 
-    return i;
+out:
+    return count;
 }
 
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
@@ -514,8 +499,8 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
                              unsigned int *bytes, uint64_t *cluster_offset)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned int l2_index;
-    uint64_t l1_index, l2_offset, *l2_slice;
+    unsigned int l2_index, sc_index;
+    uint64_t l1_index, l2_offset, *l2_slice, l2_bitmap;
     int c;
     unsigned int offset_in_cluster;
     uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -569,7 +554,9 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
+    sc_index = offset_to_sc_index(s, offset);
     *cluster_offset = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -577,7 +564,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
      * true */
     assert(nb_clusters <= INT_MAX);
 
-    type = qcow2_get_cluster_type(bs, *cluster_offset);
+    type = qcow2_get_subcluster_type(bs, *cluster_offset, l2_bitmap, sc_index);
     if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
                                 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
         qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -587,6 +574,13 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         goto fail;
     }
     switch (type) {
+    case QCOW2_CLUSTER_INVALID:
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        ret = -EIO;
+        goto fail;
+        break;
     case QCOW2_CLUSTER_COMPRESSED:
         if (has_data_file(bs)) {
             qcow2_signal_corruption(bs, true, -1, -1, "Compressed cluster "
@@ -602,16 +596,15 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         break;
     case QCOW2_CLUSTER_ZERO_PLAIN:
     case QCOW2_CLUSTER_UNALLOCATED:
-        /* how many empty clusters ? */
-        c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  l2_slice, l2_index, type);
+        c = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                         l2_slice, l2_index);
         *cluster_offset = 0;
         break;
     case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
-        /* how many allocated clusters ? */
-        c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
+    case QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER:
+        c = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                         l2_slice, l2_index);
         *cluster_offset &= L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *cluster_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -640,7 +633,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
-    bytes_available = (int64_t)c * s->cluster_size;
+    bytes_available = ((int64_t)c + sc_index) * s->subcluster_size;
 
 out:
     if (bytes_available > bytes_needed) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 15/23] qcow2: Add subcluster support to zero_in_l2_slice()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (13 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 14/23] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 16/23] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 71d4cc518a..c554b1a88c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1849,7 +1849,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     assert(nb_clusters <= INT_MAX);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t old_offset;
+        uint64_t old_offset, l2_entry = 0;
         QCow2ClusterType cluster_type;
 
         old_offset = get_l2_entry(s, l2_slice, l2_index + i);
@@ -1866,12 +1866,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         }
+
+        if (has_subclusters(s)) {
+            set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES);
+        } else {
+            l2_entry |= QCOW_OFLAG_ZERO;
+        }
+
+        set_l2_entry(s, l2_slice, l2_index + i, l2_entry);
     }
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 16/23] qcow2: Add subcluster support to discard_in_l2_slice()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (14 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 15/23] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 17/23] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c554b1a88c..bf32447d18 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1769,7 +1769,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-        if (!full_discard && s->qcow_version >= 3) {
+        if (has_subclusters(s)) {
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
+            set_l2_bitmap(s, l2_slice, l2_index + i,
+                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
+        } else if (!full_discard && s->qcow_version >= 3) {
             set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
             set_l2_entry(s, l2_slice, l2_index + i, 0);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 17/23] qcow2: Add subcluster support to check_refcounts_l2()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (15 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 16/23] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 18/23] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-refcount.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index a2c4d36378..3eda523e25 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1685,8 +1685,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
-                        l2_entry = QCOW_OFLAG_ZERO;
-                        set_l2_entry(s, l2_table, i, l2_entry);
+                        if (has_subclusters(s)) {
+                            set_l2_entry(s, l2_table, i, 0);
+                            set_l2_bitmap(s, l2_table, i,
+                                          QCOW_L2_BITMAP_ALL_ZEROES);
+                        } else {
+                            set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+                        }
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 18/23] qcow2: Add subcluster support to expand_zero_clusters_in_l1()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (16 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 17/23] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 19/23] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Two changes are needed in order to add subcluster support to this
function: deallocated clusters must have their bitmaps cleared, and
expanded clusters must have all the "subcluster allocated" bits set.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index bf32447d18..dc72f0e595 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2033,6 +2033,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                         /* not backed; therefore we can simply deallocate the
                          * cluster */
                         set_l2_entry(s, l2_slice, j, 0);
+                        set_l2_bitmap(s, l2_slice, j, 0);
                         l2_dirty = true;
                         continue;
                     }
@@ -2099,6 +2100,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 } else {
                     set_l2_entry(s, l2_slice, j, offset);
                 }
+                set_l2_bitmap(s, l2_slice, j, QCOW_L2_BITMAP_ALL_ALLOC);
                 l2_dirty = true;
             }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 19/23] qcow2: Fix offset calculation in handle_dependencies()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (17 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 18/23] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 20/23] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

l2meta_cow_start() and l2meta_cow_end() are not necessarily
cluster-aligned if the image has subclusters, so update the
calculation of old_start and old_end to guarantee that no two requests
try to write on the same cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index dc72f0e595..75579c1470 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1262,8 +1262,8 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 
         uint64_t start = guest_offset;
         uint64_t end = start + bytes;
-        uint64_t old_start = l2meta_cow_start(old_alloc);
-        uint64_t old_end = l2meta_cow_end(old_alloc);
+        uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
+        uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), s->cluster_size);
 
         if (end <= old_start || start >= old_end) {
             /* No intersection */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 20/23] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (18 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 19/23] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 21/23] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated.

This needs to happen even if the cluster was already allocated and the
L2 entry was otherwise valid.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 75579c1470..9a4bf672b3 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -980,6 +980,22 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
 
         set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_COPIED |
                      (cluster_offset + (i << s->cluster_bits)));
+
+        /* Update bitmap with the subclusters that were just written */
+        if (has_subclusters(s)) {
+            uint64_t written_from = m->cow_start.offset;
+            uint64_t written_to = m->cow_end.offset + m->cow_end.nb_bytes;
+            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+            int sc;
+            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
+                uint64_t sc_off = i * s->cluster_size + sc * s->subcluster_size;
+                if (sc_off >= written_from && sc_off < written_to) {
+                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
+                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
+                }
+            }
+            set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+        }
      }
 
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 21/23] qcow2: Add subcluster support to handle_alloc_space()
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (19 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 20/23] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 22/23] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c222cd261d..c54278ab0b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2194,6 +2194,9 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 
     for (m = l2meta; m != NULL; m = m->next) {
         int ret;
+        uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+        uint64_t nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+            m->cow_start.offset;
 
         if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
             continue;
@@ -2208,16 +2211,14 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
          * efficiently zero out the whole clusters
          */
 
-        ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-                                            m->nb_clusters * s->cluster_size,
+        ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
                                             true);
         if (ret < 0) {
             return ret;
         }
 
         BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-        ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-                                    m->nb_clusters * s->cluster_size,
+        ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
                                     BDRV_REQ_NO_FALLBACK);
         if (ret < 0) {
             if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 22/23] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (20 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 21/23] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:23 ` [RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Ideally it should be possible to zero individual subclusters using
this function, but this is currently not implemented.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index c54278ab0b..2eb032aed7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3544,6 +3544,12 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         bytes = s->cluster_size;
         nr = s->cluster_size;
         ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
+        /* TODO: allow zeroing separate subclusters, we only allow
+         * zeroing full clusters at the moment. */
+        if (nr != bytes) {
+            qemu_co_mutex_unlock(&s->lock);
+            return -ENOTSUP;
+        }
         if (ret != QCOW2_CLUSTER_UNALLOCATED &&
             ret != QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER &&
             ret != QCOW2_CLUSTER_ZERO_PLAIN &&
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (21 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 22/23] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
@ 2019-10-15 15:23 ` Alberto Garcia
  2019-10-15 15:42   ` Eric Blake
  2019-10-15 16:05 ` [RFC PATCH 00/23] Add subcluster allocation to qcow2 Eric Blake
  2019-10-23 10:39 ` Vladimir Sementsov-Ogievskiy
  24 siblings, 1 reply; 29+ messages in thread
From: Alberto Garcia @ 2019-10-15 15:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Denis V . Lunev

Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c                    |  47 ++++++++++++++
 block/qcow2.h                    |   8 ++-
 include/block/block_int.h        |   1 +
 qapi/block-core.json             |   2 +
 tests/qemu-iotests/031.out       |   8 +--
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 +++++++++++++++----------------
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  20 +++---
 tests/qemu-iotests/065           |  18 ++++--
 tests/qemu-iotests/082.out       |  48 ++++++++++++---
 tests/qemu-iotests/085.out       |  38 ++++++------
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +--
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   4 ++
 tests/qemu-iotests/242.out       |   5 ++
 tests/qemu-iotests/255.out       |   8 +--
 tests/qemu-iotests/common.filter |   1 +
 20 files changed, 221 insertions(+), 110 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2eb032aed7..44d97d30b1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1346,6 +1346,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
     s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
     s->subcluster_bits = ctz32(s->subcluster_size);
 
+    if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
+        error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size);
+        ret = -EINVAL;
+        goto fail;
+    }
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
@@ -2646,6 +2652,11 @@ int qcow2_update_header(BlockDriverState *bs)
                 .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
                 .name = "lazy refcounts",
             },
+            {
+                .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+                .bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
+                .name = "extended L2 entries",
+            },
         };
 
         ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
@@ -3138,6 +3149,27 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         goto out;
     }
 
+    if (!qcow2_opts->has_extended_l2) {
+        qcow2_opts->extended_l2 = false;
+    }
+    if (qcow2_opts->extended_l2) {
+        unsigned min_cluster_size =
+            (1 << MIN_CLUSTER_BITS) * QCOW_MAX_SUBCLUSTERS_PER_CLUSTER;
+        if (version < 3) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "compatibility level 1.1 and above (use version=v3 or "
+                       "greater)");
+            ret = -EINVAL;
+            goto out;
+        }
+        if (cluster_size < min_cluster_size) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "cluster sizes of at least %u bytes", min_cluster_size);
+            ret = -EINVAL;
+            goto out;
+        }
+    }
+
     if (!qcow2_opts->has_refcount_bits) {
         qcow2_opts->refcount_bits = 16;
     }
@@ -3232,6 +3264,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
             cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
     }
 
+    if (qcow2_opts->extended_l2) {
+        header->incompatible_features |=
+            cpu_to_be64(QCOW2_INCOMPAT_EXTL2);
+    }
+
     ret = blk_pwrite(blk, 0, header, cluster_size, 0);
     g_free(header);
     if (ret < 0) {
@@ -3409,6 +3446,7 @@ static int coroutine_fn qcow2_co_create_opts(const char *filename, QemuOpts *opt
         { BLOCK_OPT_BACKING_FMT,        "backing-fmt" },
         { BLOCK_OPT_CLUSTER_SIZE,       "cluster-size" },
         { BLOCK_OPT_LAZY_REFCOUNTS,     "lazy-refcounts" },
+        { BLOCK_OPT_EXTL2,              "extended-l2" },
         { BLOCK_OPT_REFCOUNT_BITS,      "refcount-bits" },
         { BLOCK_OPT_ENCRYPT,            BLOCK_OPT_ENCRYPT_FORMAT },
         { BLOCK_OPT_COMPAT_LEVEL,       "version" },
@@ -4612,6 +4650,9 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
             .corrupt            = s->incompatible_features &
                                   QCOW2_INCOMPAT_CORRUPT,
             .has_corrupt        = true,
+            .has_extended_l2    = true,
+            .extended_l2        = s->incompatible_features &
+                                  QCOW2_INCOMPAT_EXTL2,
             .refcount_bits      = s->refcount_bits,
             .has_bitmaps        = !!bitmaps,
             .bitmaps            = bitmaps,
@@ -5205,6 +5246,12 @@ static QemuOptsList qcow2_create_opts = {
             .help = "Postpone refcount updates",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_EXTL2,
+            .type = QEMU_OPT_BOOL,
+            .help = "Extended L2 tables",
+            .def_value_str = "off"
+        },
         {
             .name = BLOCK_OPT_REFCOUNT_BITS,
             .type = QEMU_OPT_NUMBER,
diff --git a/block/qcow2.h b/block/qcow2.h
index 60e4bf963e..ea76f51cb0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -217,13 +217,16 @@ enum {
     QCOW2_INCOMPAT_DIRTY_BITNR      = 0,
     QCOW2_INCOMPAT_CORRUPT_BITNR    = 1,
     QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+    QCOW2_INCOMPAT_EXTL2_BITNR      = 3,
     QCOW2_INCOMPAT_DIRTY            = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
     QCOW2_INCOMPAT_CORRUPT          = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
     QCOW2_INCOMPAT_DATA_FILE        = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
+    QCOW2_INCOMPAT_EXTL2            = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
     QCOW2_INCOMPAT_MASK             = QCOW2_INCOMPAT_DIRTY
                                     | QCOW2_INCOMPAT_CORRUPT
-                                    | QCOW2_INCOMPAT_DATA_FILE,
+                                    | QCOW2_INCOMPAT_DATA_FILE
+                                    | QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -503,8 +506,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-    /* FIXME: Return false until this feature is complete */
-    return false;
+    return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0422acdf1c..cfcc32d1c7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,7 @@
 #define BLOCK_OPT_REFCOUNT_BITS     "refcount_bits"
 #define BLOCK_OPT_DATA_FILE         "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW     "data_file_raw"
+#define BLOCK_OPT_EXTL2             "extended_l2"
 
 #define BLOCK_PROBE_BUF_SIZE        512
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index e6edd641f1..72e04a47d1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -85,6 +85,7 @@
       'compat': 'str',
       '*data-file': 'str',
       '*data-file-raw': 'bool',
+      '*extended-l2': 'bool',
       '*lazy-refcounts': 'bool',
       '*corrupt': 'bool',
       'refcount-bits': 'int',
@@ -4315,6 +4316,7 @@
   'data': { 'file':             'BlockdevRef',
             '*data-file':       'BlockdevRef',
             '*data-file-raw':   'bool',
+            '*extended-l2':     'bool',
             'size':             'size',
             '*version':         'BlockdevQcow2Version',
             '*backing-file':    'str',
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index 68a74d03b9..614950be56 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -117,7 +117,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -150,7 +150,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -164,7 +164,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   3
-backing_file_offset       0x178
+backing_file_offset       0x1a8
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -188,7 +188,7 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index e489b44386..c7e6512b43 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -58,7 +58,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 
@@ -86,7 +86,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 *** done
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index 6b505408dd..ba4b42fd58 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 2. Specifying size via -o ==
 
 qemu-img create -f qcow2 -o size=1024 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 3. Invalid sizes ==
 
@@ -124,84 +124,84 @@ and exabytes, respectively.
 == Check correct interpretation of suffixes for cluster size ==
 
 qemu-img create -f qcow2 -o cluster_size=1024 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check compat level option ==
 
 qemu-img create -f qcow2 -o compat=0.10 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.42 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '0.42'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=foobar TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter 'foobar'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check preallocation option ==
 
 qemu-img create -f qcow2 -o preallocation=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=metadata TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=1234 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '1234'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check encryption option ==
 
 qemu-img create -f qcow2 -o encryption=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 --object secret,id=sec0,data=123456 -o encryption=on,encrypt.key-secret=sec0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check lazy_refcounts option (only with v3) ==
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=on TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Lazy refcounts only supported with compatibility level 1.1 and above (use version=v3 or greater)
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 *** done
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 0f6b0658a1..1f6ae50027 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -20,6 +20,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: true
+    extended l2: false
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot be opened read/write
 no file open, try 'help open'
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index d6a7c2af95..96a6933554 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -26,7 +26,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -84,7 +84,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -140,7 +140,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -195,7 +195,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -264,7 +264,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 65536/65536 bytes at offset 44040192
@@ -298,7 +298,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -327,7 +327,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 131072/131072 bytes at offset 0
@@ -496,6 +496,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 
 === Try changing the external data file ===
@@ -516,6 +517,7 @@ Format specific information:
     data file: foo
     data file raw: false
     corrupt: false
+    extended l2: false
 
 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': 'data-file' is required for this image
 image: TEST_DIR/t.IMGFMT
@@ -528,6 +530,7 @@ Format specific information:
     refcount bits: 16
     data file raw: false
     corrupt: false
+    extended l2: false
 
 === Clearing and setting data-file-raw ===
 
@@ -543,6 +546,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: true
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
@@ -555,6 +559,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 qemu-img: data-file-raw cannot be set on existing images
 image: TEST_DIR/t.IMGFMT
@@ -568,5 +573,6 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 5b21eb96bd..7d3a137434 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -95,17 +95,21 @@ class TestQCow3NotLazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
@@ -113,7 +117,8 @@ class TestQCow3NotLazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=off'
     qemu_options = 'lazy-refcounts=on'
     compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 
 class TestQCow3LazyQMP(TestQMP):
@@ -122,7 +127,8 @@ class TestQCow3LazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=on'
     qemu_options = 'lazy-refcounts=off'
     compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 TestImageInfoSpecific = None
 TestQemuImgInfo = None
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 9d4ed4dc9d..2a01e8bac2 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -3,14 +3,14 @@ QA output created by 082
 === create: Options specified more than once ===
 
 Testing: create -f foo -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
 cluster_size: 65536
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -20,9 +20,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -32,9 +33,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -59,6 +61,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -82,6 +85,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -105,6 +109,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -128,6 +133,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -151,6 +157,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -174,6 +181,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -197,6 +205,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -220,6 +229,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -227,10 +237,10 @@ Supported options:
   size=<size>            - Virtual disk size
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,? TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2, -o help TEST_DIR/t.qcow2 128M
 qemu-img: Invalid option list: backing_file=TEST_DIR/t.qcow2,
@@ -258,6 +268,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -279,7 +290,7 @@ qemu-img: Format driver 'bochs' does not support image creation
 === convert: Options specified more than once ===
 
 Testing: create -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: convert -f foo -f qcow2 TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -302,6 +313,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -313,6 +325,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -339,6 +352,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -362,6 +376,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -385,6 +400,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -408,6 +424,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -431,6 +448,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -454,6 +472,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -477,6 +496,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -500,6 +520,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -538,6 +559,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -582,6 +604,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=130M -o lazy_refcounts=off TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -593,6 +616,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=8M -o lazy_refcounts=on -o size=132M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -604,6 +628,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=4M,size=148M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -630,6 +655,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -654,6 +680,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -678,6 +705,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -702,6 +730,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -726,6 +755,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -750,6 +780,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -774,6 +805,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -798,6 +830,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -839,6 +872,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index 2a5f256cd3..62e7fd82a7 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -11,7 +11,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728
 
 === Create a single snapshot on virtio0 ===
 
-Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Invalid command - missing device and nodename ===
@@ -25,32 +25,32 @@ Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file
 
 === Create several transactional group snapshots ===
 
-Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
-Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Create a couple of snapshots using blockdev-snapshot ===
diff --git a/tests/qemu-iotests/144.out b/tests/qemu-iotests/144.out
index a9a8216bea..7bf05253cc 100644
--- a/tests/qemu-iotests/144.out
+++ b/tests/qemu-iotests/144.out
@@ -7,7 +7,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912
 === Performing Live Snapshot 1 ===
 
 {"return": {}}
-Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Performing block-commit on active layer ===
@@ -26,6 +26,6 @@ Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/
 
 === Performing Live Snapshot 2 ===
 
-Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 *** done
diff --git a/tests/qemu-iotests/182.out b/tests/qemu-iotests/182.out
index ffef23e32b..78e2d51761 100644
--- a/tests/qemu-iotests/182.out
+++ b/tests/qemu-iotests/182.out
@@ -10,7 +10,7 @@ Is another process using the image [TEST_DIR/t.qcow2]?
 
 {"return": {}}
 {"return": {}}
-Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 {"return": {}}
 {"return": {}}
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index ddfbf3c765..eaab057fed 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -7,12 +7,12 @@ Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
 
 === Creating backing chain ===
 
-Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 wrote 4194304/4194304 bytes at offset 0
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Start commit job and exit qemu ===
@@ -37,7 +37,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
 === Start mirror job and exit qemu ===
 
 {"return": {}}
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
@@ -48,7 +48,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 l
 === Start backup job and exit qemu ===
 
 {"return": {}}
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out
index e86b175e39..e46dccdb08 100644
--- a/tests/qemu-iotests/198.out
+++ b/tests/qemu-iotests/198.out
@@ -72,6 +72,7 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 
 == checking image layer ==
 image: json:{"encrypt.key-secret": "sec1", "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/t.IMGFMT"}}
@@ -115,4 +116,5 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 *** done
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index 61e7241e0b..d2efc0394a 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -21,6 +21,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (inline blockdev-add, explicit defaults) ===
 
@@ -43,6 +44,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v3 non-default options) ===
 
@@ -65,6 +67,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 1
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v2 non-default options) ===
 
@@ -141,6 +144,7 @@ Format specific information:
         payload offset: 528384
         master key iters: XXX
     corrupt: false
+    extended l2: false
 
 === Invalid BlockdevRef ===
 
diff --git a/tests/qemu-iotests/242.out b/tests/qemu-iotests/242.out
index 7ac8404d11..0d32dd9148 100644
--- a/tests/qemu-iotests/242.out
+++ b/tests/qemu-iotests/242.out
@@ -15,6 +15,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 No bitmap in JSON format output
 
@@ -40,6 +41,7 @@ Format specific information:
             granularity: 32768
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -77,6 +79,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -119,6 +122,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -162,5 +166,6 @@ Format specific information:
             granularity: 16384
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Test complete
diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
index 348909fdef..4e1b917a0f 100644
--- a/tests/qemu-iotests/255.out
+++ b/tests/qemu-iotests/255.out
@@ -3,9 +3,9 @@ Finishing a commit job with background reads
 
 === Create backing chain and start VM ===
 
-Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 === Start background read requests ===
 
@@ -23,9 +23,9 @@ Closing the VM while a job is being cancelled
 
 === Create images and start VM ===
 
-Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 wrote 1048576/1048576 bytes at offset 0
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 9f418b4881..45c0fe746b 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -137,6 +137,7 @@ _filter_img_create()
         -e "s# adapter_type=[^ ]*##g" \
         -e "s# hwversion=[^ ]*##g" \
         -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+        -e "s# extended_l2=\\(on\\|off\\)##g" \
         -e "s# block_size=[0-9]\\+##g" \
         -e "s# block_state_zero=\\(on\\|off\\)##g" \
         -e "s# log_size=[0-9]\\+##g" \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2019-10-15 15:23 ` [RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2019-10-15 15:42   ` Eric Blake
  0 siblings, 0 replies; 29+ messages in thread
From: Eric Blake @ 2019-10-15 15:42 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On 10/15/19 10:23 AM, Alberto Garcia wrote:
> Now that the implementation of subclusters is complete we can finally
> add the necessary options to create and read images with this feature,
> which we call "extended L2 entries".
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---

> +++ b/qapi/block-core.json
> @@ -85,6 +85,7 @@
>         'compat': 'str',
>         '*data-file': 'str',
>         '*data-file-raw': 'bool',
> +      '*extended-l2': 'bool',
>         '*lazy-refcounts': 'bool',
>         '*corrupt': 'bool',
>         'refcount-bits': 'int',

Missing documentation for the new member.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (22 preceding siblings ...)
  2019-10-15 15:23 ` [RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2019-10-15 16:05 ` Eric Blake
  2019-10-16  7:14   ` Alberto Garcia
  2019-10-23 10:39 ` Vladimir Sementsov-Ogievskiy
  24 siblings, 1 reply; 29+ messages in thread
From: Eric Blake @ 2019-10-15 16:05 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On 10/15/19 10:23 AM, Alberto Garcia wrote:
> Hi,
> 
> this series adds a new feature to the qcow2 on-disk format called
> "Extended L2 Entries", which allows us to do subcluster allocation.
> 
> This cover letter explains the reasons behind this proposal, the
> changes to the on-disk format, test results and pending work. If you
> are curious you can also have a look at previous discussions about
> this feature:
> 

> === Changes to the on-disk format ===
> 
> An L2 entry is 64 bits wide, with this format (for uncompressed
> clusters):
> 
> 63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> **<----> <--------------------------------------------------><------->*
>    Rsrved              host cluster offset of data             Reserved
>    (6 bits)                (47 bits)                           (8 bits)
> 
>      bit 63: refcount == 1   (QCOW_OFLAG_COPIED)
>      bit 62: compressed = 1  (QCOW_OFLAG_COMPRESSED)
>      bit  0: all zeros       (QCOW_OFLAG_ZERO)
> 
> If Extended L2 Entries are enabled, bit 0 becomes reserved and must be
> unset, and this 64-bit bitmap follows the entry:
> 
> 63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> <---------------------------------> <--------------------------------->
>       subcluster reads as zeros            subcluster is allocated
>               (32 bits)                           (32 bits)

I like the grouping - you can then do a 4-byte read and comparison to 0 
to see if the entire cluster reads as zeroes or is unallocated.

With 32k clusters, this results in 1k subclusters.  In cluster 1 (offset 
32k), which bits map where?  (The obvious choices are that sub-cluster 
32k maps to bit 0, 33k maps to bit 1, ...; or that sub-cluster 32k maps 
to bit 31, 33k maps to bit 30, ...)

/me reads ahead

okay, in patch 5, you said you map the most significant bit to the first 
cluster. That feels backwards to me; I wonder if the math is any easier 
if you map sub-clusters starting from the least-significant, because 
then you get:

bit = (address >> cluster_size) & 32

rather than

bit = 31 - ((address >> cluster_size) & 32)


> Some comments about the results:
> 
> - The smallest allowed cluster size for an image with subclusters is
>    16 KB (in this case the subclusters size is 512 bytes), hence the
>    missing values in the 4 KB and 8 KB rows.

Again reading ahead, I see that patch 5 requires a 16k minimum cluster 
for using extended L2.  Could we still permit clusters smaller than 
that, but merely document that subclusters are always a minimum of 512 
bytes and therefore for an 8k cluster we only use 16 bits (leaving the 
other 16 bits zero)?  But I'm also fine with the simplicity of just 
stating that subclusters require at least 16k clusters.


> === To do ===
> 
> A couple of things are missing from this series:
> 
> - The ability to efficiently zero individual subclusters using
>    qcow2_co_pwrite_zeroes(). At the moment only full clusters can be
>    zeroed with this method.
> 
> - Alternatively we could get rid of the individual "all zeroes" bits
>    altogether and have 64 subclusters per cluster. We would still have
>    the QCOW_OFLAG_ZERO bit in the standard cluster descriptor.

I think you've got more flexibility with the two bits per sub-cluster 
than you would with just 1 bit and 64 subclusters, so I don't think this 
direction is going to get us far.

> 
> - The number of subclusters per cluster is always 32. It would be
>    trivial to allow configuring this, but I don't see any use case.

Agreed.

> 
> - Tests: I have a few written that I'll add in future revisions of
>    this series.
> 
> - handle_alloc_space() works at the subclusters level. That is, if you
>    have an unallocated 2MB cluster with 64KB subclusters, no backing
>    image and you write 4KB of data, QEMU won't write zeroes to the
>    affected subcluster(s) and will use handle_alloc_space() instead.
>    The other subclusters won't be touched and will remain unallocated.
>    This behavior is consistent with how subclusters work and saves disk
>    space, but offers slightly lower performance (see test results
>    above). Theoretically we could offer a setting to configure this,
>    but I'm not convinced that this is very useful.
> 
> ===========================
> 
> As usual, feedback is welcome,

Looks promising!

How do subclusters interact with external data files?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2
  2019-10-15 16:05 ` [RFC PATCH 00/23] Add subcluster allocation to qcow2 Eric Blake
@ 2019-10-16  7:14   ` Alberto Garcia
  0 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-16  7:14 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Tue 15 Oct 2019 06:05:23 PM CEST, Eric Blake wrote:

>> 63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
>> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> <---------------------------------> <--------------------------------->
>>       subcluster reads as zeros            subcluster is allocated
>>               (32 bits)                           (32 bits)
>
> okay, in patch 5, you said you map the most significant bit to the
> first cluster. That feels backwards to me; I wonder if the math is any
> easier if you map sub-clusters starting from the least-significant,
> because then you get:
>
> bit = (address >> cluster_size) & 32
>
> rather than
>
> bit = 31 - ((address >> cluster_size) & 32)

The reason why I chose that ordering is because I think it's more
natural for debugging if you read from left to right:

63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
00000000 00000000 00000000 00011111 11100000 00000000 00000000 00000000
<---------------------------------> <--------------------------------->
      subcluster reads as zeros            subcluster is allocated

Here the last five subclusters read as zeros, and the first three
subclusters are allocated.

I don't think the math is any different. What you need in the code is

  1) A way to get the subcluster index. That doesn't change, it's

        sc_index = (address >> cluster_bits) & 31

     in both cases.

  2) A way to get the "subcluster reads as zeros" and "subcluster is
     allocated" masks. That's not very different either, it's a constant
     shifted by the subcluster index in both cases:

     LSB first:

        all_zeros_mask = (1 << 32) << sc_index
        allocated_mask = 1 << sc_index

     MSB first:

        all_zeros_mask = (1 << 63) >> sc_index
        allocated_mask = (1 << 31) >> sc_index

>> Some comments about the results:
>> 
>> - The smallest allowed cluster size for an image with subclusters is
>>    16 KB (in this case the subclusters size is 512 bytes), hence the
>>    missing values in the 4 KB and 8 KB rows.
>
> Again reading ahead, I see that patch 5 requires a 16k minimum cluster 
> for using extended L2.  Could we still permit clusters smaller than 
> that, but merely document that subclusters are always a minimum of 512 
> bytes and therefore for an 8k cluster we only use 16 bits (leaving the 
> other 16 bits zero)?  But I'm also fine with the simplicity of just 
> stating that subclusters require at least 16k clusters.

I can't think of any reason why you would want smaller clusters, the
numbers show that the performance starts to drop with sizes under 16KB.

> How do subclusters interact with external data files?

As far as I'm aware they work just fine (I'll add tests for that
anyway).

Berto


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2
  2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (23 preceding siblings ...)
  2019-10-15 16:05 ` [RFC PATCH 00/23] Add subcluster allocation to qcow2 Eric Blake
@ 2019-10-23 10:39 ` Vladimir Sementsov-Ogievskiy
  2019-10-26 21:26   ` Alberto Garcia
  24 siblings, 1 reply; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-10-23 10:39 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Denis Lunev, qemu-block, Max Reitz

Hi!

This is very interesting! Could you please export a branch to look at,
as patches can't be applied on master now :(

15.10.2019 18:23, Alberto Garcia wrote:
> Hi,
> 
> this series adds a new feature to the qcow2 on-disk format called
> "Extended L2 Entries", which allows us to do subcluster allocation.
> 
> This cover letter explains the reasons behind this proposal, the
> changes to the on-disk format, test results and pending work. If you
> are curious you can also have a look at previous discussions about
> this feature:
> 
>     https://lists.gnu.org/archive/html/qemu-block/2017-04/msg00178.html
>     https://lists.gnu.org/archive/html/qemu-block/2019-06/msg01155.html
> 
> This is the first proper version of the patches, and I believe that
> the implementation is complete. However since I'm proposing a change
> to the on-disk format I'm labeling this as RFC because I'm expecting
> some debate. I'll remove the RFC tag and add more tests in future
> revisions.
> 
> === Problem ===
> 
> A qcow2 image is divided into units of constant size called clusters,
> and among other things it contains metadata that maps guest addresses
> to host addresses (the so-called L1 and L2 tables).
> 
> There are two basic problems that result from this:
> 
> 1) Reading from or writing to a qcow2 image involves reading the
>     corresponding entry on the L2 table that maps the guest address to
>     the host address. This is very slow because it involves two I/O
>     operations: one on the L2 table and the other one on the actual
>     data cluster.
> 
> 2) A cluster is the smallest unit of allocation. Therefore writing a
>     mere 512 bytes to an empty disk requires allocating a complete
>     cluster and filling it with zeroes (or with data from the backing
>     image if there is one). This wastes more disk space and also has a
>     negative impact on I/O.
> 
> Problem (1) can be solved by caching the L2 tables in memory. The
> maximum amount of disk space used by L2 tables depends on the virtual
> disk size and the cluster size:
> 
>     max_l2_size = virtual_disk_size * 8 / cluster_size
> 
> Because of this, the only way to reduce the size of the L2 tables is
> by increasing the cluster size (which can be any power of two between
> 512 bytes and 2 MB). But then we hit problem (2): I/O is slower and
> more disk space is wasted.
> 
> === The proposal ===
> 
> The proposal is to extend the qcow2 format by allowing subcluster
> allocation. The on-disk format remains essentially the same, except
> that each data cluster is internally divided into 32 subclusters of
> equal size.
> 
> The way it works in practice is with a new optional feature called
> "Extended L2 Entries", that needs to be enabled when an image is
> created. With this, each entry on an L2 table is accompanied by a
> bitmap indicating the allocation state of each one of the subclusters
> for that cluster. The size of an L2 entry doubles from 64 to 128 bits.
> 
> Other than L2 entries, all other data structures remain unchanged, but
> for data clusters the smallest unit of allocation is now the
> subcluster. Reference counting is still at the cluster level, because
> there is no way to reference individual subclusters. Copy-on-write on
> internal snapshots needs to copy complete clusters, so that scenario
> would not benefit from this change.
> 
> I see two main use cases for this feature:
> 
> a) The qcow2 image is not too large / the L2 cache is not a problem,
>     but you want to increase the allocation performance. In this case
>     you can have a 128KB cluster with 4KB subclusters (with 4KB being a
>     common block size in ext4 and other filesystems)
> 
> b) The qcow2 image is very large and you want to save metadata space
>     in order to have a smaller L2 cache. In this case you can go for
>     the maximum cluster size (2MB) but you want to have smaller
>     subclusters to increase the allocation performance and optimize the
>     disk usage.
> 
> === Changes to the on-disk format ===
> 
> An L2 entry is 64 bits wide, with this format (for uncompressed
> clusters):
> 
> 63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> **<----> <--------------------------------------------------><------->*
>    Rsrved              host cluster offset of data             Reserved
>    (6 bits)                (47 bits)                           (8 bits)
> 
>      bit 63: refcount == 1   (QCOW_OFLAG_COPIED)
>      bit 62: compressed = 1  (QCOW_OFLAG_COMPRESSED)
>      bit  0: all zeros       (QCOW_OFLAG_ZERO)
> 
> If Extended L2 Entries are enabled, bit 0 becomes reserved and must be
> unset, and this 64-bit bitmap follows the entry:
> 
> 63    56 55    48 47    40 39    32 31    24 23    16 15     8 7      0
> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> <---------------------------------> <--------------------------------->
>       subcluster reads as zeros            subcluster is allocated
>               (32 bits)                           (32 bits)
> 
> All this applies to uncompressed clusters. Compressed clusters are not
> divided into subclusters, the cluster descriptor remains exactly the
> same, and the 64-bit bitmap is not used (i.e. all bits are always 0).
> 
> === Test results ===
> 
> I made all tests on an SSD drive, writing to an empty qcow2 image with
> a fully populated 40GB backing image, performing random writes using
> fio with a block size of 4KB. I ran the tests with all available
> cluster sizes starting from 4KB.
> 
> It's important to point out that once a cluster has been completely
> allocated then having subclusters offers no performance benefit. For
> this reason the size of the image for these tests (40GB) was chosen to
> be large enough to guarantee that there are always new clusters being
> allocated. This is therefore a worst-case scenario (or best-case for
> this feature, if you want).
> 
> Subcluster sizes are in brackets:
> 
> |-----------------+----------------+-----------------|
> |   Cluster size  | subclusters=on | subclusters=off |
> |-----------------+----------------+-----------------|
> |    4 KB ( N/A ) |            N/A |         95 IOPS |
> |    8 KB ( N/A ) |            N/A |        599 IOPS |
> |   16 KB (512 B) |      4129 IOPS |       3597 IOPS |
> |   32 KB  (1 KB) |     11255 IOPS |       2642 IOPS |
> |   64 KB  (2 KB) |     13341 IOPS |       1671 IOPS |
> |  128 KB  (4 KB) |     12391 IOPS |        870 IOPS |
> |  256 KB  (8 KB) |      9645 IOPS |        566 IOPS |
> |  512 KB (16 KB) |      4960 IOPS |        359 IOPS |
> | 1024 KB (32 KB) |      2732 IOPS |        215 IOPS |
> | 2048 KB (64 KB) |      1630 IOPS |        214 IOPS |
> |-----------------+----------------+-----------------|
> 
> Here are the same tests, but without any backing image:
> 
> |-----------------+----------------+-----------------|
> |   Cluster size  | subclusters=on | subclusters=off |
> |-----------------+----------------+-----------------|
> |    4 KB ( N/A ) |            N/A |         93 IOPS |
> |    8 KB ( N/A ) |            N/A |        539 IOPS |
> |   16 KB (512 B) |      4174 IOPS |       7598 IOPS |
> |   32 KB  (1 KB) |     11326 IOPS |      11957 IOPS |
> |   64 KB  (2 KB) |     13516 IOPS |      13375 IOPS |
> |  128 KB  (4 KB) |     12435 IOPS |      13274 IOPS |
> |  256 KB  (8 KB) |     12071 IOPS |      14174 IOPS |
> |  512 KB (16 KB) |     12169 IOPS |      14343 IOPS |
> | 1024 KB (32 KB) |     12307 IOPS |      14622 IOPS |
> | 2048 KB (64 KB) |     12784 IOPS |      14574 IOPS |
> |-----------------+----------------+-----------------|
> 
> Some comments about the results:
> 
> - The smallest allowed cluster size for an image with subclusters is
>    16 KB (in this case the subclusters size is 512 bytes), hence the
>    missing values in the 4 KB and 8 KB rows.
> 
> - In images with a backing file: allocation is much faster when
>    subclusters are enabled. As expected, images with a cluster size of
>    64KB perform similar to images with a subcluster size of 64KB. When
>    there is no copy-on-write involved (subcluster size <= 4KB) then the
>    maximum performance is achieved.
> 
> - In images without a backing file: Since commit c8bb23cbdb when empty
>    clusters are allocated for the first time they are filled with
>    zeroes using an efficient method (typically fallocate() with
>    FALLOC_FL_ZERO_RANGE). This is so fast that having subclusters here
>    is actually a bit slower in most cases (although it still saves disk
>    space).
> 
> - The 16 KB cluster / 512 byte subcluster case is quite slow.
>    I haven't debugged this but I suspect that this is because new
>    clusters need to be allocated all the time, and also L2 and refcount
>    tables are very small and need to grow all the time. The same pattern
>    can be seen in images without subclusters.
> 
> === To do ===
> 
> A couple of things are missing from this series:
> 
> - The ability to efficiently zero individual subclusters using
>    qcow2_co_pwrite_zeroes(). At the moment only full clusters can be
>    zeroed with this method.
> 
> - Alternatively we could get rid of the individual "all zeroes" bits
>    altogether and have 64 subclusters per cluster. We would still have
>    the QCOW_OFLAG_ZERO bit in the standard cluster descriptor.
> 
> - The number of subclusters per cluster is always 32. It would be
>    trivial to allow configuring this, but I don't see any use case.
> 
> - Tests: I have a few written that I'll add in future revisions of
>    this series.
> 
> - handle_alloc_space() works at the subclusters level. That is, if you
>    have an unallocated 2MB cluster with 64KB subclusters, no backing
>    image and you write 4KB of data, QEMU won't write zeroes to the
>    affected subcluster(s) and will use handle_alloc_space() instead.
>    The other subclusters won't be touched and will remain unallocated.
>    This behavior is consistent with how subclusters work and saves disk
>    space, but offers slightly lower performance (see test results
>    above). Theoretically we could offer a setting to configure this,
>    but I'm not convinced that this is very useful.
> 
> ===========================
> 
> As usual, feedback is welcome,
> 
> Berto
> 
> Alberto Garcia (23):
>    qcow2: Add calculate_l2_meta()
>    qcow2: Split cluster_needs_cow() out of count_cow_clusters()
>    qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
>    qcow2: Add get_l2_entry() and set_l2_entry()
>    qcow2: Document the Extended L2 Entries feature
>    qcow2: Add dummy has_subclusters() function
>    qcow2: Add subcluster-related fields to BDRVQcow2State
>    qcow2: Add offset_to_sc_index()
>    qcow2: Add l2_entry_size()
>    qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
>    qcow2: Add qcow2_get_subcluster_type()
>    qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER
>    qcow2: Add subcluster support to calculate_l2_meta()
>    qcow2: Add subcluster support to qcow2_get_cluster_offset()
>    qcow2: Add subcluster support to zero_in_l2_slice()
>    qcow2: Add subcluster support to discard_in_l2_slice()
>    qcow2: Add subcluster support to check_refcounts_l2()
>    qcow2: Add subcluster support to expand_zero_clusters_in_l1()
>    qcow2: Fix offset calculation in handle_dependencies()
>    qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
>    qcow2: Add subcluster support to handle_alloc_space()
>    qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
>    qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
> 
>   block/qcow2-cluster.c            | 547 ++++++++++++++++++++-----------
>   block/qcow2-refcount.c           |  38 ++-
>   block/qcow2.c                    |  83 ++++-
>   block/qcow2.h                    | 121 ++++++-
>   docs/interop/qcow2.txt           |  68 +++-
>   docs/qcow2-cache.txt             |  19 +-
>   include/block/block_int.h        |   1 +
>   qapi/block-core.json             |   2 +
>   tests/qemu-iotests/031.out       |   8 +-
>   tests/qemu-iotests/036.out       |   4 +-
>   tests/qemu-iotests/049.out       | 102 +++---
>   tests/qemu-iotests/060.out       |   1 +
>   tests/qemu-iotests/061.out       |  20 +-
>   tests/qemu-iotests/065           |  18 +-
>   tests/qemu-iotests/082.out       |  48 ++-
>   tests/qemu-iotests/085.out       |  38 +--
>   tests/qemu-iotests/144.out       |   4 +-
>   tests/qemu-iotests/182.out       |   2 +-
>   tests/qemu-iotests/185.out       |   8 +-
>   tests/qemu-iotests/198.out       |   2 +
>   tests/qemu-iotests/206.out       |   4 +
>   tests/qemu-iotests/242.out       |   5 +
>   tests/qemu-iotests/255.out       |   8 +-
>   tests/qemu-iotests/common.filter |   1 +
>   24 files changed, 817 insertions(+), 335 deletions(-)
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH 00/23] Add subcluster allocation to qcow2
  2019-10-23 10:39 ` Vladimir Sementsov-Ogievskiy
@ 2019-10-26 21:26   ` Alberto Garcia
  0 siblings, 0 replies; 29+ messages in thread
From: Alberto Garcia @ 2019-10-26 21:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Denis Lunev, qemu-block, Max Reitz

On Wed 23 Oct 2019 12:39:14 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
>
> This is very interesting! Could you please export a branch to look at,
> as patches can't be applied on master now :(

I just sent a new version with some updates and rebased on top of the
current master.

Berto


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-10-26 21:53 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15 15:23 [RFC PATCH 00/23] Add subcluster allocation to qcow2 Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 01/23] qcow2: Add calculate_l2_meta() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 02/23] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 03/23] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 04/23] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 05/23] qcow2: Document the Extended L2 Entries feature Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 06/23] qcow2: Add dummy has_subclusters() function Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 07/23] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 08/23] qcow2: Add offset_to_sc_index() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 09/23] qcow2: Add l2_entry_size() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 10/23] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 11/23] qcow2: Add qcow2_get_subcluster_type() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 12/23] qcow2: Handle QCOW2_CLUSTER_UNALLOCATED_SUBCLUSTER Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 13/23] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 14/23] qcow2: Add subcluster support to qcow2_get_cluster_offset() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 15/23] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 16/23] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 17/23] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 18/23] qcow2: Add subcluster support to expand_zero_clusters_in_l1() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 19/23] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 20/23] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 21/23] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 22/23] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
2019-10-15 15:23 ` [RFC PATCH 23/23] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
2019-10-15 15:42   ` Eric Blake
2019-10-15 16:05 ` [RFC PATCH 00/23] Add subcluster allocation to qcow2 Eric Blake
2019-10-16  7:14   ` Alberto Garcia
2019-10-23 10:39 ` Vladimir Sementsov-Ogievskiy
2019-10-26 21:26   ` Alberto Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.