All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/30] Add subcluster allocation to qcow2
@ 2020-03-17 18:15 Alberto Garcia
  2020-03-17 18:15 ` [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
                   ` (30 more replies)
  0 siblings, 31 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Hi,

here's the new version of the patches to add subcluster allocation
support to qcow2.

Please refer to the cover letter of the first version for a full
description of the patches:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

I think that this version fixes all the problems pointed out by Max
and Eric during the review a couple of weeks ago. I also dropped the
RFC tag.

Berto

v4:
- Patch 01: New patch
- Patch 02: New patch
- Patch 05: Documentation updates [Eric]
- Patch 06: Fix rebase conflicts
- Patch 07: Change bit order in the subcluster allocation bitmap.
            Change incompatible bit number. [Max, Eric]
- Patch 09: Rename QCOW_MAX_SUBCLUSTERS_PER_CLUSTER to
            QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER [Eric]
- Patch 13: Change bit order in the subcluster allocation bitmap [Max, Eric]
            Add more documentation.
            Ignore the subcluster bitmap in the L2 entries of
            compressed clusters.
- Patch 14: New patch
- Patch 15: Update to work with the changes from patches 02 and 14.
- Patch 16: Update to work with the changes from patches 02 and 14.
- Patch 18: Update to work with the changes from patches 02 and 14.
            Update documentation.
            Fix return value on early exit.
- Patch 20: Make sure to clear the subcluster allocation bitmap when a
            cluster is unallocated.
- Patch 26: Update to work with the changes from patch 14.
- Patch 27: New patch [Max]
- Patch 28: Update version number, incompatible bit number and test
            expectations.
- Patch 30: Add new tests.
            Make the test verify its own results. [Max]

v3: https://lists.gnu.org/archive/html/qemu-block/2019-12/msg00587.html
- Patch 01: Rename host_offset to host_cluster_offset and make 'bytes'
            an unsigned int [Max]
- Patch 03: Rename cluster_needs_cow to cluster_needs_new_alloc and
            count_cow_clusters to count_single_write_clusters. Update
            documentation and add more assertions and checks [Max]
- Patch 09: Update qcow2_co_truncate() to properly support extended L2
            entries [Max]
- Patch 10: Forbid calling set_l2_bitmap() if the image does not have
            extended L2 entries [Max]
- Patch 11 (new): Add QCow2SubclusterType [Max]
- Patch 12 (new): Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
- Patch 13 (new): Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
- Patch 14: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
- Patch 15: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
- Patch 19: Don't call set_l2_bitmap() if the image does not have
            extended L2 entries [Max]
- Patch 21: Use smaller data types.
- Patch 22: Don't call set_l2_bitmap() if the image does not have
            extended L2 entries [Max]
- Patch 23: Use smaller data types.
- Patch 25: Update test results and documentation. Move the check for
            the minimum subcluster size to validate_cluster_size().
- Patch 26 (new): Add subcluster support to qcow2_measure()
- Patch 27: Add more tests

v2: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01642.html
- Patch 12: Update after the changes in 88f468e546.
- Patch 21 (new): Clear the L2 bitmap when allocating a compressed
  cluster. Compressed clusters should have the bitmap all set to 0.
- Patch 24: Document the new fields in the QAPI documentation [Eric].
- Patch 25: Allow qcow2 preallocation with backing files.
- Patch 26: Add some tests for qcow2 images with extended L2 entries.

v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html

Output of git backport-diff against v3:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/30:[down] 'qcow2: Make Qcow2AioTask store the full host offset'
002/30:[down] 'qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()'
003/30:[----] [-C] 'qcow2: Add calculate_l2_meta()'
004/30:[----] [--] 'qcow2: Split cluster_needs_cow() out of count_cow_clusters()'
005/30:[0020] [FC] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()'
006/30:[0010] [FC] 'qcow2: Add get_l2_entry() and set_l2_entry()'
007/30:[0020] [FC] 'qcow2: Document the Extended L2 Entries feature'
008/30:[----] [--] 'qcow2: Add dummy has_subclusters() function'
009/30:[0004] [FC] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
010/30:[----] [--] 'qcow2: Add offset_to_sc_index()'
011/30:[----] [-C] 'qcow2: Add l2_entry_size()'
012/30:[----] [--] 'qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()'
013/30:[0046] [FC] 'qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()'
014/30:[down] 'qcow2: Add cluster type parameter to qcow2_get_host_offset()'
015/30:[0082] [FC] 'qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*'
016/30:[0002] [FC] 'qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC'
017/30:[----] [-C] 'qcow2: Add subcluster support to calculate_l2_meta()'
018/30:[down] 'qcow2: Add subcluster support to qcow2_get_host_offset()'
019/30:[----] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
020/30:[0012] [FC] 'qcow2: Add subcluster support to discard_in_l2_slice()'
021/30:[----] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
022/30:[----] [--] 'qcow2: Fix offset calculation in handle_dependencies()'
023/30:[----] [-C] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
024/30:[----] [--] 'qcow2: Clear the L2 bitmap when allocating a compressed cluster'
025/30:[----] [--] 'qcow2: Add subcluster support to handle_alloc_space()'
026/30:[0006] [FC] 'qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only'
027/30:[down] 'qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters'
028/30:[0019] [FC] 'qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit'
029/30:[----] [--] 'qcow2: Add subcluster support to qcow2_measure()'
030/30:[0313] [FC] 'iotests: Add tests for qcow2 images with extended L2 entries'

Alberto Garcia (30):
  qcow2: Make Qcow2AioTask store the full host offset
  qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  qcow2: Add calculate_l2_meta()
  qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  qcow2: Add get_l2_entry() and set_l2_entry()
  qcow2: Document the Extended L2 Entries feature
  qcow2: Add dummy has_subclusters() function
  qcow2: Add subcluster-related fields to BDRVQcow2State
  qcow2: Add offset_to_sc_index()
  qcow2: Add l2_entry_size()
  qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  qcow2: Add cluster type parameter to qcow2_get_host_offset()
  qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  qcow2: Add subcluster support to calculate_l2_meta()
  qcow2: Add subcluster support to qcow2_get_host_offset()
  qcow2: Add subcluster support to zero_in_l2_slice()
  qcow2: Add subcluster support to discard_in_l2_slice()
  qcow2: Add subcluster support to check_refcounts_l2()
  qcow2: Fix offset calculation in handle_dependencies()
  qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  qcow2: Clear the L2 bitmap when allocating a compressed cluster
  qcow2: Add subcluster support to handle_alloc_space()
  qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  qcow2: Assert that expand_zero_clusters_in_l1() does not support
    subclusters
  qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  qcow2: Add subcluster support to qcow2_measure()
  iotests: Add tests for qcow2 images with extended L2 entries

 docs/interop/qcow2.txt           |  68 ++-
 docs/qcow2-cache.txt             |  19 +-
 qapi/block-core.json             |   7 +
 block/qcow2.h                    | 178 +++++++-
 include/block/block_int.h        |   1 +
 block/qcow2-cluster.c            | 696 ++++++++++++++++++++-----------
 block/qcow2-refcount.c           |  38 +-
 block/qcow2.c                    | 257 +++++++-----
 tests/qemu-iotests/031.out       |   8 +-
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 ++---
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061           |   6 +
 tests/qemu-iotests/061.out       |  25 +-
 tests/qemu-iotests/065           |  18 +-
 tests/qemu-iotests/082.out       |  48 ++-
 tests/qemu-iotests/085.out       |  38 +-
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +-
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   4 +
 tests/qemu-iotests/242.out       |   5 +
 tests/qemu-iotests/255.out       |   8 +-
 tests/qemu-iotests/271           | 359 ++++++++++++++++
 tests/qemu-iotests/271.out       | 244 +++++++++++
 tests/qemu-iotests/280.out       |   2 +-
 tests/qemu-iotests/common.filter |   1 +
 tests/qemu-iotests/group         |   1 +
 29 files changed, 1682 insertions(+), 472 deletions(-)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

-- 
2.20.1



^ permalink raw reply	[flat|nested] 128+ messages in thread

* [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
@ 2020-03-17 18:15 ` Alberto Garcia
  2020-03-18 11:23   ` Eric Blake
                     ` (2 more replies)
  2020-03-17 18:15 ` [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() Alberto Garcia
                   ` (29 subsequent siblings)
  30 siblings, 3 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
host offset. In practice this is not very useful because all users(*)
of this structure need the final host offset into the cluster, which
they calculate using

   host_offset = file_cluster_offset + offset_into_cluster(s, offset)

There is no reason why Qcow2AioTask cannot store host_offset directly
and that is what this patch does.

(*) compressed clusters are the exception: in this case what
    file_cluster_offset was storing was the full compressed cluster
    descriptor (offset + size). This does not change with this patch
    but it is documented now.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 68 +++++++++++++++++++++++++--------------------------
 1 file changed, 33 insertions(+), 35 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d44b45633d..a00b0c8e45 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -74,7 +74,7 @@ typedef struct {
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
-                           uint64_t file_cluster_offset,
+                           uint64_t cluster_descriptor,
                            uint64_t offset,
                            uint64_t bytes,
                            QEMUIOVector *qiov,
@@ -2041,7 +2041,7 @@ out:
 
 static coroutine_fn int
 qcow2_co_preadv_encrypted(BlockDriverState *bs,
-                           uint64_t file_cluster_offset,
+                           uint64_t host_offset,
                            uint64_t offset,
                            uint64_t bytes,
                            QEMUIOVector *qiov,
@@ -2068,16 +2068,12 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-    ret = bdrv_co_pread(s->data_file,
-                        file_cluster_offset + offset_into_cluster(s, offset),
-                        bytes, buf, 0);
+    ret = bdrv_co_pread(s->data_file, host_offset, bytes, buf, 0);
     if (ret < 0) {
         goto fail;
     }
 
-    if (qcow2_co_decrypt(bs,
-                         file_cluster_offset + offset_into_cluster(s, offset),
-                         offset, buf, bytes) < 0)
+    if (qcow2_co_decrypt(bs, host_offset, offset, buf, bytes) < 0)
     {
         ret = -EIO;
         goto fail;
@@ -2095,7 +2091,7 @@ typedef struct Qcow2AioTask {
 
     BlockDriverState *bs;
     QCow2ClusterType cluster_type; /* only for read */
-    uint64_t file_cluster_offset;
+    uint64_t host_offset; /* or full descriptor in compressed clusters */
     uint64_t offset;
     uint64_t bytes;
     QEMUIOVector *qiov;
@@ -2108,7 +2104,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
                                        AioTaskPool *pool,
                                        AioTaskFunc func,
                                        QCow2ClusterType cluster_type,
-                                       uint64_t file_cluster_offset,
+                                       uint64_t host_offset,
                                        uint64_t offset,
                                        uint64_t bytes,
                                        QEMUIOVector *qiov,
@@ -2123,7 +2119,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
         .bs = bs,
         .cluster_type = cluster_type,
         .qiov = qiov,
-        .file_cluster_offset = file_cluster_offset,
+        .host_offset = host_offset,
         .offset = offset,
         .bytes = bytes,
         .qiov_offset = qiov_offset,
@@ -2132,7 +2128,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
 
     trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
                          func == qcow2_co_preadv_task_entry ? "read" : "write",
-                         cluster_type, file_cluster_offset, offset, bytes,
+                         cluster_type, host_offset, offset, bytes,
                          qiov, qiov_offset);
 
     if (!pool) {
@@ -2146,13 +2142,12 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
                                              QCow2ClusterType cluster_type,
-                                             uint64_t file_cluster_offset,
+                                             uint64_t host_offset,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
                                              size_t qiov_offset)
 {
     BDRVQcow2State *s = bs->opaque;
-    int offset_in_cluster = offset_into_cluster(s, offset);
 
     switch (cluster_type) {
     case QCOW2_CLUSTER_ZERO_PLAIN:
@@ -2168,19 +2163,17 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
                                    qiov, qiov_offset, 0);
 
     case QCOW2_CLUSTER_COMPRESSED:
-        return qcow2_co_preadv_compressed(bs, file_cluster_offset,
+        return qcow2_co_preadv_compressed(bs, host_offset,
                                           offset, bytes, qiov, qiov_offset);
 
     case QCOW2_CLUSTER_NORMAL:
-        assert(offset_into_cluster(s, file_cluster_offset) == 0);
         if (bs->encrypted) {
-            return qcow2_co_preadv_encrypted(bs, file_cluster_offset,
+            return qcow2_co_preadv_encrypted(bs, host_offset,
                                              offset, bytes, qiov, qiov_offset);
         }
 
         BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-        return bdrv_co_preadv_part(s->data_file,
-                                   file_cluster_offset + offset_in_cluster,
+        return bdrv_co_preadv_part(s->data_file, host_offset,
                                    bytes, qiov, qiov_offset, 0);
 
     default:
@@ -2196,7 +2189,7 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
 
     assert(!t->l2meta);
 
-    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->file_cluster_offset,
+    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->host_offset,
                                 t->offset, t->bytes, t->qiov, t->qiov_offset);
 }
 
@@ -2232,11 +2225,20 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
+            /*
+             * For compressed clusters the variable cluster_offset
+             * does not actually store the offset but the full
+             * descriptor. We need to leave it unchanged because
+             * that's what qcow2_co_preadv_compressed() expects.
+             */
+            uint64_t host_offset = (ret == QCOW2_CLUSTER_COMPRESSED) ?
+                cluster_offset :
+                cluster_offset + offset_into_cluster(s, offset);
             if (!aio && cur_bytes != bytes) {
                 aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
             }
             ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, ret,
-                                 cluster_offset, offset, cur_bytes,
+                                 host_offset, offset, cur_bytes,
                                  qiov, qiov_offset, NULL);
             if (ret < 0) {
                 goto out;
@@ -2387,7 +2389,7 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
  *           not use it somehow after qcow2_co_pwritev_task() call
  */
 static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
-                                              uint64_t file_cluster_offset,
+                                              uint64_t host_offset,
                                               uint64_t offset, uint64_t bytes,
                                               QEMUIOVector *qiov,
                                               uint64_t qiov_offset,
@@ -2396,7 +2398,6 @@ static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
     int ret;
     BDRVQcow2State *s = bs->opaque;
     void *crypt_buf = NULL;
-    int offset_in_cluster = offset_into_cluster(s, offset);
     QEMUIOVector encrypted_qiov;
 
     if (bs->encrypted) {
@@ -2409,8 +2410,7 @@ static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
         }
         qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes);
 
-        if (qcow2_co_encrypt(bs, file_cluster_offset + offset_in_cluster,
-                             offset, crypt_buf, bytes) < 0)
+        if (qcow2_co_encrypt(bs, host_offset, offset, crypt_buf, bytes) < 0)
         {
             ret = -EIO;
             goto out_unlocked;
@@ -2435,10 +2435,8 @@ static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
      */
     if (!merge_cow(offset, bytes, qiov, qiov_offset, l2meta)) {
         BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-        trace_qcow2_writev_data(qemu_coroutine_self(),
-                                file_cluster_offset + offset_in_cluster);
-        ret = bdrv_co_pwritev_part(s->data_file,
-                                   file_cluster_offset + offset_in_cluster,
+        trace_qcow2_writev_data(qemu_coroutine_self(), host_offset);
+        ret = bdrv_co_pwritev_part(s->data_file, host_offset,
                                    bytes, qiov, qiov_offset, 0);
         if (ret < 0) {
             goto out_unlocked;
@@ -2468,7 +2466,7 @@ static coroutine_fn int qcow2_co_pwritev_task_entry(AioTask *task)
 
     assert(!t->cluster_type);
 
-    return qcow2_co_pwritev_task(t->bs, t->file_cluster_offset,
+    return qcow2_co_pwritev_task(t->bs, t->host_offset,
                                  t->offset, t->bytes, t->qiov, t->qiov_offset,
                                  t->l2meta);
 }
@@ -2523,8 +2521,8 @@ static coroutine_fn int qcow2_co_pwritev_part(
             aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
         }
         ret = qcow2_add_task(bs, aio, qcow2_co_pwritev_task_entry, 0,
-                             cluster_offset, offset, cur_bytes,
-                             qiov, qiov_offset, l2meta);
+                             cluster_offset + offset_in_cluster, offset,
+                             cur_bytes, qiov, qiov_offset, l2meta);
         l2meta = NULL; /* l2meta is consumed by qcow2_co_pwritev_task() */
         if (ret < 0) {
             goto fail_nometa;
@@ -4358,7 +4356,7 @@ qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
 
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
-                           uint64_t file_cluster_offset,
+                           uint64_t cluster_descriptor,
                            uint64_t offset,
                            uint64_t bytes,
                            QEMUIOVector *qiov,
@@ -4370,8 +4368,8 @@ qcow2_co_preadv_compressed(BlockDriverState *bs,
     uint8_t *buf, *out_buf;
     int offset_in_cluster = offset_into_cluster(s, offset);
 
-    coffset = file_cluster_offset & s->cluster_offset_mask;
-    nb_csectors = ((file_cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
+    coffset = cluster_descriptor & s->cluster_offset_mask;
+    nb_csectors = ((cluster_descriptor >> s->csize_shift) & s->csize_mask) + 1;
     csize = nb_csectors * QCOW2_COMPRESSED_SECTOR_SIZE -
         (coffset & ~QCOW2_COMPRESSED_SECTOR_MASK);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
  2020-03-17 18:15 ` [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
@ 2020-03-17 18:15 ` Alberto Garcia
  2020-03-18 12:08   ` Eric Blake
                     ` (2 more replies)
  2020-03-17 18:16 ` [PATCH v4 03/30] qcow2: Add calculate_l2_meta() Alberto Garcia
                   ` (28 subsequent siblings)
  30 siblings, 3 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

qcow2_get_cluster_offset() takes an (unaligned) guest offset and
returns the (aligned) offset of the corresponding cluster in the qcow2
image.

In practice none of the callers need to know where the cluster starts
so this patch makes the function calculate and return the final host
offset directly. The function is also renamed accordingly.

There is a pre-existing exception with compressed clusters: in this
case the function returns the complete cluster descriptor (containing
the offset and size of the compressed data). This does not change with
this patch but it is now documented.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h         |  4 ++--
 block/qcow2-cluster.c | 38 ++++++++++++++++++++++----------------
 block/qcow2.c         | 24 +++++++-----------------
 3 files changed, 31 insertions(+), 35 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 0942126232..f47ef6ca4e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -679,8 +679,8 @@ int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
 int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
                           uint8_t *buf, int nb_sectors, bool enc, Error **errp);
 
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
-                             unsigned int *bytes, uint64_t *cluster_offset);
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+                          unsigned int *bytes, uint64_t *host_offset);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
                                unsigned int *bytes, uint64_t *host_offset,
                                QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 17f1363279..95f04d12cc 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -496,10 +496,15 @@ static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
 
 
 /*
- * get_cluster_offset
+ * get_host_offset
  *
- * For a given offset of the virtual disk, find the cluster type and offset in
- * the qcow2 file. The offset is stored in *cluster_offset.
+ * For a given offset of the virtual disk find the equivalent host
+ * offset in the qcow2 file and store it in *host_offset. Neither
+ * offset needs to be aligned to a cluster boundary.
+ *
+ * If the cluster is unallocated then *host_offset will be 0.
+ * If the cluster is compressed then *host_offset will contain the
+ * complete compressed cluster descriptor.
  *
  * On entry, *bytes is the maximum number of contiguous bytes starting at
  * offset that we are interested in.
@@ -511,12 +516,12 @@ static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
  * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
  * cases.
  */
-int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
-                             unsigned int *bytes, uint64_t *cluster_offset)
+int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
+                          unsigned int *bytes, uint64_t *host_offset)
 {
     BDRVQcow2State *s = bs->opaque;
     unsigned int l2_index;
-    uint64_t l1_index, l2_offset, *l2_slice;
+    uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
     int c;
     unsigned int offset_in_cluster;
     uint64_t bytes_available, bytes_needed, nb_clusters;
@@ -537,7 +542,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         bytes_needed = bytes_available;
     }
 
-    *cluster_offset = 0;
+    *host_offset = 0;
 
     /* seek to the l2 offset in the l1 table */
 
@@ -570,7 +575,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
-    *cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    l2_entry = be64_to_cpu(l2_slice[l2_index]);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -578,7 +583,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
      * true */
     assert(nb_clusters <= INT_MAX);
 
-    type = qcow2_get_cluster_type(bs, *cluster_offset);
+    type = qcow2_get_cluster_type(bs, l2_entry);
     if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
                                 type == QCOW2_CLUSTER_ZERO_ALLOC)) {
         qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
@@ -599,41 +604,42 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         }
         /* Compressed clusters can only be processed one by one */
         c = 1;
-        *cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
+        *host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
         break;
     case QCOW2_CLUSTER_ZERO_PLAIN:
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
         c = count_contiguous_clusters_unallocated(bs, nb_clusters,
                                                   &l2_slice[l2_index], type);
-        *cluster_offset = 0;
+        *host_offset = 0;
         break;
     case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
         /* how many allocated clusters ? */
         c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
                                       &l2_slice[l2_index], QCOW_OFLAG_ZERO);
-        *cluster_offset &= L2E_OFFSET_MASK;
-        if (offset_into_cluster(s, *cluster_offset)) {
+        *host_offset = l2_entry & L2E_OFFSET_MASK;
+        if (offset_into_cluster(s, *host_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
                                     "Cluster allocation offset %#"
                                     PRIx64 " unaligned (L2 offset: %#" PRIx64
-                                    ", L2 index: %#x)", *cluster_offset,
+                                    ", L2 index: %#x)", *host_offset,
                                     l2_offset, l2_index);
             ret = -EIO;
             goto fail;
         }
-        if (has_data_file(bs) && *cluster_offset != offset - offset_in_cluster)
+        if (has_data_file(bs) && *host_offset != offset - offset_in_cluster)
         {
             qcow2_signal_corruption(bs, true, -1, -1,
                                     "External data file host cluster offset %#"
                                     PRIx64 " does not match guest cluster "
                                     "offset: %#" PRIx64
-                                    ", L2 index: %#x)", *cluster_offset,
+                                    ", L2 index: %#x)", *host_offset,
                                     offset - offset_in_cluster, l2_index);
             ret = -EIO;
             goto fail;
         }
+        *host_offset += offset_in_cluster;
         break;
     default:
         abort();
diff --git a/block/qcow2.c b/block/qcow2.c
index a00b0c8e45..5b6ceaa2fa 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1964,7 +1964,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
                                               BlockDriverState **file)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t cluster_offset;
+    uint64_t host_offset;
     unsigned int bytes;
     int ret, status = 0;
 
@@ -1977,7 +1977,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     }
 
     bytes = MIN(INT_MAX, count);
-    ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
+    ret = qcow2_get_host_offset(bs, offset, &bytes, &host_offset);
     qemu_co_mutex_unlock(&s->lock);
     if (ret < 0) {
         return ret;
@@ -1987,7 +1987,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
 
     if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
         !s->crypto) {
-        *map = cluster_offset | offset_into_cluster(s, offset);
+        *map = host_offset;
         *file = s->data_file->bs;
         status |= BDRV_BLOCK_OFFSET_VALID;
     }
@@ -2201,7 +2201,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
     BDRVQcow2State *s = bs->opaque;
     int ret = 0;
     unsigned int cur_bytes; /* number of bytes in current iteration */
-    uint64_t cluster_offset = 0;
+    uint64_t host_offset = 0;
     AioTaskPool *aio = NULL;
 
     while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2213,7 +2213,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         }
 
         qemu_co_mutex_lock(&s->lock);
-        ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
+        ret = qcow2_get_host_offset(bs, offset, &cur_bytes, &host_offset);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
             goto out;
@@ -2225,15 +2225,6 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
-            /*
-             * For compressed clusters the variable cluster_offset
-             * does not actually store the offset but the full
-             * descriptor. We need to leave it unchanged because
-             * that's what qcow2_co_preadv_compressed() expects.
-             */
-            uint64_t host_offset = (ret == QCOW2_CLUSTER_COMPRESSED) ?
-                cluster_offset :
-                cluster_offset + offset_into_cluster(s, offset);
             if (!aio && cur_bytes != bytes) {
                 aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
             }
@@ -3735,7 +3726,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
         bytes = s->cluster_size;
         nr = s->cluster_size;
-        ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
+        ret = qcow2_get_host_offset(bs, offset, &nr, &off);
         if (ret != QCOW2_CLUSTER_UNALLOCATED &&
             ret != QCOW2_CLUSTER_ZERO_PLAIN &&
             ret != QCOW2_CLUSTER_ZERO_ALLOC) {
@@ -3800,7 +3791,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         cur_bytes = MIN(bytes, INT_MAX);
         cur_write_flags = write_flags;
 
-        ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, &copy_offset);
+        ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes, &copy_offset);
         if (ret < 0) {
             goto out;
         }
@@ -3832,7 +3823,6 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
         case QCOW2_CLUSTER_NORMAL:
             child = s->data_file;
-            copy_offset += offset_into_cluster(s, src_offset);
             break;
 
         default:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 03/30] qcow2: Add calculate_l2_meta()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
  2020-03-17 18:15 ` [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
  2020-03-17 18:15 ` [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-09  8:30   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 04/30] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
                   ` (27 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

handle_alloc() creates a QCowL2Meta structure in order to update the
image metadata and perform the necessary copy-on-write operations.

This patch moves that code to a separate function so it can be used
from other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 77 +++++++++++++++++++++++++++++--------------
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 95f04d12cc..802fc599a5 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1039,6 +1039,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
     }
 }
 
+/*
+ * For a given write request, create a new QCowL2Meta structure, add
+ * it to @m and the BDRVQcow2State.cluster_allocs list.
+ *
+ * @host_cluster_offset points to the beginning of the first cluster.
+ *
+ * @guest_offset and @bytes indicate the offset and length of the
+ * request.
+ *
+ * If @keep_old is true it means that the clusters were already
+ * allocated and will be overwritten. If false then the clusters are
+ * new and we have to decrease the reference count of the old ones.
+ */
+static void calculate_l2_meta(BlockDriverState *bs,
+                              uint64_t host_cluster_offset,
+                              uint64_t guest_offset, unsigned bytes,
+                              QCowL2Meta **m, bool keep_old)
+{
+    BDRVQcow2State *s = bs->opaque;
+    unsigned cow_start_from = 0;
+    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
+    unsigned cow_end_from = cow_start_to + bytes;
+    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
+    QCowL2Meta *old_m = *m;
+
+    *m = g_malloc0(sizeof(**m));
+    **m = (QCowL2Meta) {
+        .next           = old_m,
+
+        .alloc_offset   = host_cluster_offset,
+        .offset         = start_of_cluster(s, guest_offset),
+        .nb_clusters    = nb_clusters,
+
+        .keep_old_clusters = keep_old,
+
+        .cow_start = {
+            .offset     = cow_start_from,
+            .nb_bytes   = cow_start_to - cow_start_from,
+        },
+        .cow_end = {
+            .offset     = cow_end_from,
+            .nb_bytes   = cow_end_to - cow_end_from,
+        },
+    };
+
+    qemu_co_queue_init(&(*m)->dependent_requests);
+    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1437,35 +1487,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
     int avail_bytes = nb_clusters << s->cluster_bits;
     int nb_bytes = MIN(requested_bytes, avail_bytes);
-    QCowL2Meta *old_m = *m;
-
-    *m = g_malloc0(sizeof(**m));
-
-    **m = (QCowL2Meta) {
-        .next           = old_m,
-
-        .alloc_offset   = alloc_cluster_offset,
-        .offset         = start_of_cluster(s, guest_offset),
-        .nb_clusters    = nb_clusters,
-
-        .keep_old_clusters  = keep_old_clusters,
-
-        .cow_start = {
-            .offset     = 0,
-            .nb_bytes   = offset_into_cluster(s, guest_offset),
-        },
-        .cow_end = {
-            .offset     = nb_bytes,
-            .nb_bytes   = avail_bytes - nb_bytes,
-        },
-    };
-    qemu_co_queue_init(&(*m)->dependent_requests);
-    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 
     *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                      m, keep_old_clusters);
+
     return 1;
 
 fail:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 04/30] qcow2: Split cluster_needs_cow() out of count_cow_clusters()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (2 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 03/30] qcow2: Add calculate_l2_meta() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-03-17 18:16 ` [PATCH v4 05/30] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

We are going to need it in other places.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 802fc599a5..e251d00890 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1089,6 +1089,24 @@ static void calculate_l2_meta(BlockDriverState *bs,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
+/* Returns true if writing to a cluster requires COW */
+static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+{
+    switch (qcow2_get_cluster_type(bs, l2_entry)) {
+    case QCOW2_CLUSTER_NORMAL:
+        if (l2_entry & QCOW_OFLAG_COPIED) {
+            return false;
+        }
+    case QCOW2_CLUSTER_UNALLOCATED:
+    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_CLUSTER_ZERO_PLAIN:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        return true;
+    default:
+        abort();
+    }
+}
+
 /*
  * Returns the number of contiguous clusters that can be used for an allocating
  * write, but require COW to be performed (this includes yet unallocated space,
@@ -1101,25 +1119,11 @@ static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
 
     for (i = 0; i < nb_clusters; i++) {
         uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
-
-        switch(cluster_type) {
-        case QCOW2_CLUSTER_NORMAL:
-            if (l2_entry & QCOW_OFLAG_COPIED) {
-                goto out;
-            }
+        if (!cluster_needs_cow(bs, l2_entry)) {
             break;
-        case QCOW2_CLUSTER_UNALLOCATED:
-        case QCOW2_CLUSTER_COMPRESSED:
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
-            break;
-        default:
-            abort();
         }
     }
 
-out:
     assert(i <= nb_clusters);
     return i;
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 05/30] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (3 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 04/30] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-09 10:59   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 06/30] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
                   ` (25 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

When writing to a qcow2 file there are two functions that take a
virtual offset and return a host offset, possibly allocating new
clusters if necessary:

   - handle_copied() looks for normal data clusters that are already
     allocated and have a reference count of 1. In those clusters we
     can simply write the data and there is no need to perform any
     copy-on-write.

   - handle_alloc() looks for clusters that do need copy-on-write,
     either because they haven't been allocated yet, because their
     reference count is != 1 or because they are ZERO_ALLOC clusters.

The ZERO_ALLOC case is a bit special because those are clusters that
are already allocated and they could perfectly be dealt with in
handle_copied() (as long as copy-on-write is performed when required).

In fact, there is extra code specifically for them in handle_alloc()
that tries to reuse the existing allocation if possible and frees them
otherwise.

This patch changes the handling of ZERO_ALLOC clusters so the
semantics of these two functions are now like this:

   - handle_copied() looks for clusters that are already allocated and
     which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
     reference count of 1).

   - handle_alloc() looks for clusters for which we need a new
     allocation (all other cases).

One important difference after this change is that clusters found
in handle_copied() may now require copy-on-write, but this will be
necessary anyway once we add support for subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 230 ++++++++++++++++++++++++------------------
 1 file changed, 130 insertions(+), 100 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e251d00890..5c81046c34 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1041,13 +1041,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
 
 /*
  * For a given write request, create a new QCowL2Meta structure, add
- * it to @m and the BDRVQcow2State.cluster_allocs list.
+ * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
+ * request does not need copy-on-write or changes to the L2 metadata
+ * then this function does nothing.
  *
  * @host_cluster_offset points to the beginning of the first cluster.
  *
  * @guest_offset and @bytes indicate the offset and length of the
  * request.
  *
+ * @l2_slice contains the L2 entries of all clusters involved in this
+ * write request.
+ *
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
@@ -1055,15 +1060,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
 static void calculate_l2_meta(BlockDriverState *bs,
                               uint64_t host_cluster_offset,
                               uint64_t guest_offset, unsigned bytes,
-                              QCowL2Meta **m, bool keep_old)
+                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned cow_start_from = 0;
+    int l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry;
+    unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
-    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
     unsigned nb_clusters = size_to_clusters(s, cow_end_from);
     QCowL2Meta *old_m = *m;
+    QCow2ClusterType type;
+
+    assert(nb_clusters <= s->l2_slice_size - l2_index);
+
+    /* Return if there's no COW (all clusters are normal and we keep them) */
+    if (keep_old) {
+        int i;
+        for (i = 0; i < nb_clusters; i++) {
+            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+                break;
+            }
+        }
+        if (i == nb_clusters) {
+            return;
+        }
+    }
+
+    /* Get the L2 entry of the first cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_start_from = cow_start_to;
+    } else {
+        cow_start_from = 0;
+    }
+
+    /* Get the L2 entry of the last cluster */
+    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    type = qcow2_get_cluster_type(bs, l2_entry);
+
+    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
+        cow_end_to = cow_end_from;
+    } else {
+        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+    }
 
     *m = g_malloc0(sizeof(**m));
     **m = (QCowL2Meta) {
@@ -1089,18 +1132,22 @@ static void calculate_l2_meta(BlockDriverState *bs,
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
 }
 
-/* Returns true if writing to a cluster requires COW */
-static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
+/*
+ * Returns true if writing to the cluster pointed to by @l2_entry
+ * requires a new allocation (that is, if the cluster is unallocated
+ * or has refcount > 1 and therefore cannot be written in-place).
+ */
+static bool cluster_needs_new_alloc(BlockDriverState *bs, uint64_t l2_entry)
 {
     switch (qcow2_get_cluster_type(bs, l2_entry)) {
     case QCOW2_CLUSTER_NORMAL:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
         if (l2_entry & QCOW_OFLAG_COPIED) {
             return false;
         }
     case QCOW2_CLUSTER_UNALLOCATED:
     case QCOW2_CLUSTER_COMPRESSED:
     case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_ZERO_ALLOC:
         return true;
     default:
         abort();
@@ -1108,20 +1155,38 @@ static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
 }
 
 /*
- * Returns the number of contiguous clusters that can be used for an allocating
- * write, but require COW to be performed (this includes yet unallocated space,
- * which must copy from the backing file)
+ * Returns the number of contiguous clusters that can be written to
+ * using one single write request, starting from @l2_index.
+ * At most @nb_clusters are checked.
+ *
+ * If @new_alloc is true this counts clusters that are either
+ * unallocated, or allocated but with refcount > 1 (so they need to be
+ * newly allocated and COWed).
+ *
+ * If @new_alloc is false this counts clusters that are already
+ * allocated and can be overwritten in-place (this includes clusters
+ * of type QCOW2_CLUSTER_ZERO_ALLOC).
  */
-static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
-    uint64_t *l2_slice, int l2_index)
+static int count_single_write_clusters(BlockDriverState *bs, int nb_clusters,
+                                       uint64_t *l2_slice, int l2_index,
+                                       bool new_alloc)
 {
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
-        if (!cluster_needs_cow(bs, l2_entry)) {
+        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        if (cluster_needs_new_alloc(bs, l2_entry) != new_alloc) {
             break;
         }
+        if (!new_alloc) {
+            if (expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+                break;
+            }
+            expected_offset += s->cluster_size;
+        }
     }
 
     assert(i <= nb_clusters);
@@ -1192,10 +1257,10 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 }
 
 /*
- * Checks how many already allocated clusters that don't require a copy on
- * write there are at the given guest_offset (up to *bytes). If *host_offset is
- * not INV_OFFSET, only physically contiguous clusters beginning at this host
- * offset are counted.
+ * Checks how many already allocated clusters that don't require a new
+ * allocation there are at the given guest_offset (up to *bytes).
+ * If *host_offset is not INV_OFFSET, only physically contiguous clusters
+ * beginning at this host offset are counted.
  *
  * Note that guest_offset may not be cluster aligned. In this case, the
  * returned *host_offset points to exact byte referenced by guest_offset and
@@ -1204,12 +1269,12 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
  * Returns:
  *   0:     if no allocated clusters are available at the given offset.
  *          *bytes is normally unchanged. It is set to 0 if the cluster
- *          is allocated and doesn't need COW, but doesn't have the right
- *          physical offset.
+ *          is allocated and can be overwritten in-place but doesn't have
+ *          the right physical offset.
  *
- *   1:     if allocated clusters that don't require a COW are available at
- *          the requested offset. *bytes may have decreased and describes
- *          the length of the area that can be written to.
+ *   1:     if allocated clusters that can be overwritten in place are
+ *          available at the requested offset. *bytes may have decreased
+ *          and describes the length of the area that can be written to.
  *
  *  -errno: in error cases
  */
@@ -1239,7 +1304,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
 
     l2_index = offset_to_l2_slice_index(s, guest_offset);
     nb_clusters = MIN(nb_clusters, s->l2_slice_size - l2_index);
-    assert(nb_clusters <= INT_MAX);
+    /* Limit total byte count to BDRV_REQUEST_MAX_BYTES */
+    nb_clusters = MIN(nb_clusters, BDRV_REQUEST_MAX_BYTES >> s->cluster_bits);
 
     /* Find L2 entry for the first involved cluster */
     ret = get_cluster_table(bs, guest_offset, &l2_slice, &l2_index);
@@ -1249,18 +1315,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
 
     cluster_offset = be64_to_cpu(l2_slice[l2_index]);
 
-    /* Check how many clusters are already allocated and don't need COW */
-    if (qcow2_get_cluster_type(bs, cluster_offset) == QCOW2_CLUSTER_NORMAL
-        && (cluster_offset & QCOW_OFLAG_COPIED))
-    {
+    if (!cluster_needs_new_alloc(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
         bool offset_matches =
             (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
 
         if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
+            qcow2_signal_corruption(bs, true, -1, -1, "%s cluster offset "
                                     "%#llx unaligned (guest offset: %#" PRIx64
-                                    ")", cluster_offset & L2E_OFFSET_MASK,
+                                    ")", cluster_offset & QCOW_OFLAG_ZERO ?
+                                    "Preallocated zero" : "Data",
+                                    cluster_offset & L2E_OFFSET_MASK,
                                     guest_offset);
             ret = -EIO;
             goto out;
@@ -1273,15 +1338,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         }
 
         /* We keep all QCOW_OFLAG_COPIED clusters */
-        keep_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index],
-                                      QCOW_OFLAG_COPIED | QCOW_OFLAG_ZERO);
+        keep_clusters = count_single_write_clusters(bs, nb_clusters, l2_slice,
+                                                    l2_index, false);
         assert(keep_clusters <= nb_clusters);
 
         *bytes = MIN(*bytes,
                  keep_clusters * s->cluster_size
                  - offset_into_cluster(s, guest_offset));
+        assert(*bytes != 0);
+
+        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
+                          *bytes, l2_slice, m, true);
 
         ret = 1;
     } else {
@@ -1357,9 +1424,10 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
 }
 
 /*
- * Allocates new clusters for an area that either is yet unallocated or needs a
- * copy on write. If *host_offset is not INV_OFFSET, clusters are only
- * allocated if the new allocation can match the specified host offset.
+ * Allocates new clusters for an area that is either still unallocated or
+ * cannot be overwritten in-place. If *host_offset is not INV_OFFSET,
+ * clusters are only allocated if the new allocation can match the specified
+ * host offset.
  *
  * Note that guest_offset may not be cluster aligned. In this case, the
  * returned *host_offset points to exact byte referenced by guest_offset and
@@ -1382,12 +1450,10 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     BDRVQcow2State *s = bs->opaque;
     int l2_index;
     uint64_t *l2_slice;
-    uint64_t entry;
     uint64_t nb_clusters;
     int ret;
-    bool keep_old_clusters = false;
 
-    uint64_t alloc_cluster_offset = INV_OFFSET;
+    uint64_t alloc_cluster_offset;
 
     trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
                              *bytes);
@@ -1402,10 +1468,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
 
     l2_index = offset_to_l2_slice_index(s, guest_offset);
     nb_clusters = MIN(nb_clusters, s->l2_slice_size - l2_index);
-    assert(nb_clusters <= INT_MAX);
-
-    /* Limit total allocation byte count to INT_MAX */
-    nb_clusters = MIN(nb_clusters, INT_MAX >> s->cluster_bits);
+    /* Limit total allocation byte count to BDRV_REQUEST_MAX_BYTES */
+    nb_clusters = MIN(nb_clusters, BDRV_REQUEST_MAX_BYTES >> s->cluster_bits);
 
     /* Find L2 entry for the first involved cluster */
     ret = get_cluster_table(bs, guest_offset, &l2_slice, &l2_index);
@@ -1413,67 +1477,32 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    entry = be64_to_cpu(l2_slice[l2_index]);
-    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index);
+    nb_clusters = count_single_write_clusters(bs, nb_clusters,
+                                              l2_slice, l2_index, true);
 
     /* This function is only called when there were no non-COW clusters, so if
      * we can't find any unallocated or COW clusters either, something is
      * wrong with our code. */
     assert(nb_clusters > 0);
 
-    if (qcow2_get_cluster_type(bs, entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
-        (entry & QCOW_OFLAG_COPIED) &&
-        (*host_offset == INV_OFFSET ||
-         start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
-    {
-        int preallocated_nb_clusters;
-
-        if (offset_into_cluster(s, entry & L2E_OFFSET_MASK)) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Preallocated zero "
-                                    "cluster offset %#llx unaligned (guest "
-                                    "offset: %#" PRIx64 ")",
-                                    entry & L2E_OFFSET_MASK, guest_offset);
-            ret = -EIO;
-            goto fail;
-        }
-
-        /* Try to reuse preallocated zero clusters; contiguous normal clusters
-         * would be fine, too, but count_cow_clusters() above has limited
-         * nb_clusters already to a range of COW clusters */
-        preallocated_nb_clusters =
-            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_COPIED);
-        assert(preallocated_nb_clusters > 0);
-
-        nb_clusters = preallocated_nb_clusters;
-        alloc_cluster_offset = entry & L2E_OFFSET_MASK;
-
-        /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
-         * should not free them. */
-        keep_old_clusters = true;
+    /* Allocate at a given offset in the image file */
+    alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
+        start_of_cluster(s, *host_offset);
+    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
+                                  &nb_clusters);
+    if (ret < 0) {
+        goto out;
     }
 
-    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-
-    if (alloc_cluster_offset == INV_OFFSET) {
-        /* Allocate, if necessary at a given offset in the image file */
-        alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
-                               start_of_cluster(s, *host_offset);
-        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
-                                      &nb_clusters);
-        if (ret < 0) {
-            goto fail;
-        }
-
-        /* Can't extend contiguous allocation */
-        if (nb_clusters == 0) {
-            *bytes = 0;
-            return 0;
-        }
-
-        assert(alloc_cluster_offset != INV_OFFSET);
+    /* Can't extend contiguous allocation */
+    if (nb_clusters == 0) {
+        *bytes = 0;
+        ret = 0;
+        goto out;
     }
 
+    assert(alloc_cluster_offset != INV_OFFSET);
+
     /*
      * Save info needed for meta data update.
      *
@@ -1496,13 +1525,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
-                      m, keep_old_clusters);
+    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
+                      m, false);
 
-    return 1;
+    ret = 1;
 
-fail:
-    if (*m && (*m)->nb_clusters > 0) {
+out:
+    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
+    if (ret < 0 && *m && (*m)->nb_clusters > 0) {
         QLIST_REMOVE(*m, next_in_flight);
     }
     return ret;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 06/30] qcow2: Add get_l2_entry() and set_l2_entry()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (4 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 05/30] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-10  8:48   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature Alberto Garcia
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The size of an L2 entry is 64 bits, but if we want to have subclusters
we need extended L2 entries. This means that we have to access L2
tables and slices differently depending on whether an image has
extended L2 entries or not.

This patch replaces all l2_slice[] accesses with calls to
get_l2_entry() and set_l2_entry().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h          | 12 ++++++++
 block/qcow2-cluster.c  | 63 ++++++++++++++++++++++--------------------
 block/qcow2-refcount.c | 17 ++++++------
 3 files changed, 54 insertions(+), 38 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index f47ef6ca4e..7754d9bd02 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -495,6 +495,18 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                    int idx)
+{
+    return be64_to_cpu(l2_slice[idx]);
+}
+
+static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
+                                int idx, uint64_t entry)
+{
+    l2_slice[idx] = cpu_to_be64(entry);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 5c81046c34..cd48ab0223 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -383,12 +383,13 @@ fail:
  * cluster which may require a different handling)
  */
 static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, uint64_t stop_flags)
+        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
     QCow2ClusterType first_cluster_type;
     uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = be64_to_cpu(l2_slice[0]);
+    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t offset = first_entry & mask;
 
     first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
@@ -401,7 +402,7 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
            first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = be64_to_cpu(l2_slice[i]) & mask;
+        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
         if (offset + (uint64_t) i * cluster_size != l2_entry) {
             break;
         }
@@ -417,14 +418,16 @@ static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
 static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
                                                  int nb_clusters,
                                                  uint64_t *l2_slice,
+                                                 int l2_index,
                                                  QCow2ClusterType wanted_type)
 {
+    BDRVQcow2State *s = bs->opaque;
     int i;
 
     assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
            wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = be64_to_cpu(l2_slice[i]);
+        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
         QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
 
         if (type != wanted_type) {
@@ -575,7 +578,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
-    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -610,14 +613,14 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
         c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  &l2_slice[l2_index], type);
+                                                  l2_slice, l2_index, type);
         *host_offset = 0;
         break;
     case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
         /* how many allocated clusters ? */
         c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      &l2_slice[l2_index], QCOW_OFLAG_ZERO);
+                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
         *host_offset = l2_entry & L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *host_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -771,7 +774,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     /* Compression can't overwrite anything. Fail if the cluster was already
      * allocated. */
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
     if (cluster_offset & L2E_OFFSET_MASK) {
         qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
         return -EIO;
@@ -800,7 +803,7 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-    l2_slice[l2_index] = cpu_to_be64(cluster_offset);
+    set_l2_entry(s, l2_slice, l2_index, cluster_offset);
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
@@ -993,14 +996,14 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
          * cluster the second one has to do RMW (which is done above by
          * perform_cow()), update l2 table with its cluster pointer and free
          * old cluster. This is what this loop does */
-        if (l2_slice[l2_index + i] != 0) {
-            old_cluster[j++] = l2_slice[l2_index + i];
+        if (get_l2_entry(s, l2_slice, l2_index + i) != 0) {
+            old_cluster[j++] = get_l2_entry(s, l2_slice, l2_index + i);
         }
 
         /* The offset must fit in the offset field of the L2 table entry */
         assert((offset & L2E_OFFSET_MASK) == offset);
 
-        l2_slice[l2_index + i] = cpu_to_be64(offset | QCOW_OFLAG_COPIED);
+        set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
      }
 
 
@@ -1014,8 +1017,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      */
     if (!m->keep_old_clusters && j != 0) {
         for (i = 0; i < j; i++) {
-            qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1,
-                                    QCOW2_DISCARD_NEVER);
+            qcow2_free_any_clusters(bs, old_cluster[i], 1, QCOW2_DISCARD_NEVER);
         }
     }
 
@@ -1078,7 +1080,7 @@ static void calculate_l2_meta(BlockDriverState *bs,
     if (keep_old) {
         int i;
         for (i = 0; i < nb_clusters; i++) {
-            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
             if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
                 break;
             }
@@ -1089,7 +1091,7 @@ static void calculate_l2_meta(BlockDriverState *bs,
     }
 
     /* Get the L2 entry of the first cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1099,7 +1101,7 @@ static void calculate_l2_meta(BlockDriverState *bs,
     }
 
     /* Get the L2 entry of the last cluster */
-    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
+    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
     type = qcow2_get_cluster_type(bs, l2_entry);
 
     if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
@@ -1172,12 +1174,12 @@ static int count_single_write_clusters(BlockDriverState *bs, int nb_clusters,
                                        bool new_alloc)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
     uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
     int i;
 
     for (i = 0; i < nb_clusters; i++) {
-        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         if (cluster_needs_new_alloc(bs, l2_entry) != new_alloc) {
             break;
         }
@@ -1313,7 +1315,7 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
         return ret;
     }
 
-    cluster_offset = be64_to_cpu(l2_slice[l2_index]);
+    cluster_offset = get_l2_entry(s, l2_slice, l2_index);
 
     if (!cluster_needs_new_alloc(bs, cluster_offset)) {
         /* If a specific host_offset is required, check it */
@@ -1694,7 +1696,7 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     for (i = 0; i < nb_clusters; i++) {
         uint64_t old_l2_entry;
 
-        old_l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
+        old_l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * If full_discard is false, make sure that a discarded area reads back
@@ -1734,9 +1736,9 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (!full_discard && s->qcow_version >= 3) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
-            l2_slice[l2_index + i] = cpu_to_be64(0);
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
         }
 
         /* Then decrease the refcount */
@@ -1816,7 +1818,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
         uint64_t old_offset;
         QCow2ClusterType cluster_type;
 
-        old_offset = be64_to_cpu(l2_slice[l2_index + i]);
+        old_offset = get_l2_entry(s, l2_slice, l2_index + i);
 
         /*
          * Minimize L2 changes if the cluster already reads back as
@@ -1830,10 +1832,11 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            l2_slice[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            l2_slice[l2_index + i] |= cpu_to_be64(QCOW_OFLAG_ZERO);
+            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
+            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
         }
     }
 
@@ -1971,7 +1974,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
             }
 
             for (j = 0; j < s->l2_slice_size; j++) {
-                uint64_t l2_entry = be64_to_cpu(l2_slice[j]);
+                uint64_t l2_entry = get_l2_entry(s, l2_slice, j);
                 int64_t offset = l2_entry & L2E_OFFSET_MASK;
                 QCow2ClusterType cluster_type =
                     qcow2_get_cluster_type(bs, l2_entry);
@@ -1985,7 +1988,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                     if (!bs->backing) {
                         /* not backed; therefore we can simply deallocate the
                          * cluster */
-                        l2_slice[j] = 0;
+                        set_l2_entry(s, l2_slice, j, 0);
                         l2_dirty = true;
                         continue;
                     }
@@ -2051,9 +2054,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 }
 
                 if (l2_refcount == 1) {
-                    l2_slice[j] = cpu_to_be64(offset | QCOW_OFLAG_COPIED);
+                    set_l2_entry(s, l2_slice, j, offset | QCOW_OFLAG_COPIED);
                 } else {
-                    l2_slice[j] = cpu_to_be64(offset);
+                    set_l2_entry(s, l2_slice, j, offset);
                 }
                 l2_dirty = true;
             }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7ef1c0e42a..141e4fdcb1 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1310,7 +1310,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     uint64_t cluster_index;
                     uint64_t offset;
 
-                    entry = be64_to_cpu(l2_slice[j]);
+                    entry = get_l2_entry(s, l2_slice, j);
                     old_entry = entry;
                     entry &= ~QCOW_OFLAG_COPIED;
                     offset = entry & L2E_OFFSET_MASK;
@@ -1384,7 +1384,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                             qcow2_cache_set_dependency(bs, s->l2_table_cache,
                                                        s->refcount_block_cache);
                         }
-                        l2_slice[j] = cpu_to_be64(entry);
+                        set_l2_entry(s, l2_slice, j, entry);
                         qcow2_cache_entry_mark_dirty(s->l2_table_cache,
                                                      l2_slice);
                     }
@@ -1617,7 +1617,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
 
     /* Do the actual checks */
     for(i = 0; i < s->l2_size; i++) {
-        l2_entry = be64_to_cpu(l2_table[i]);
+        l2_entry = get_l2_entry(s, l2_table, i);
 
         switch (qcow2_get_cluster_type(bs, l2_entry)) {
         case QCOW2_CLUSTER_COMPRESSED:
@@ -1686,7 +1686,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
-                        l2_table[i] = cpu_to_be64(l2_entry);
+                        set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, sizeof(uint64_t), false);
                         if (ret < 0) {
@@ -1914,7 +1914,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         for (j = 0; j < s->l2_size; j++) {
-            uint64_t l2_entry = be64_to_cpu(l2_table[j]);
+            uint64_t l2_entry = get_l2_entry(s, l2_table, j);
             uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
             QCow2ClusterType cluster_type = qcow2_get_cluster_type(bs, l2_entry);
 
@@ -1937,9 +1937,10 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
                             "l2_entry=%" PRIx64 " refcount=%" PRIu64 "\n",
                             repair ? "Repairing" : "ERROR", l2_entry, refcount);
                     if (repair) {
-                        l2_table[j] = cpu_to_be64(refcount == 1
-                                    ? l2_entry |  QCOW_OFLAG_COPIED
-                                    : l2_entry & ~QCOW_OFLAG_COPIED);
+                        set_l2_entry(s, l2_table, j,
+                                     refcount == 1 ?
+                                     l2_entry |  QCOW_OFLAG_COPIED :
+                                     l2_entry & ~QCOW_OFLAG_COPIED);
                         l2_dirty++;
                     }
                 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (5 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 06/30] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-08 11:09   ` Max Reitz
  2020-04-09 15:12   ` Eric Blake
  2020-03-17 18:16 ` [PATCH v4 08/30] qcow2: Add dummy has_subclusters() function Alberto Garcia
                   ` (23 subsequent siblings)
  30 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
 docs/qcow2-cache.txt   | 19 +++++++++++-
 2 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 5597e24474..2e8cad38c4 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
                     as the maximum cluster size and won't be able to open images
                     with larger cluster sizes.
 
+                    Note: if the image has Extended L2 Entries then cluster_bits
+                    must be at least 14 (i.e. 16384 byte clusters).
+
          24 - 31:   size
                     Virtual disk size in bytes.
 
@@ -114,7 +117,12 @@ the next fields through header_length.
                                 clusters. The compression_type field must be
                                 present and not zero.
 
-                    Bits 4-63:  Reserved (set to 0)
+                    Bit 4:      Extended L2 Entries.  If this bit is set then
+                                L2 table entries use an extended format that
+                                allows subcluster-based allocation. See the
+                                Extended L2 Entries section for more details.
+
+                    Bits 5-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -493,7 +501,7 @@ cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-    l2_entries = (cluster_size / sizeof(uint64_t))
+    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
 
     l2_index = (offset / cluster_size) % l2_entries
     l1_index = (offset / cluster_size) / l2_entries
@@ -503,6 +511,8 @@ obtained as follows:
 
     return cluster_offset + (offset % cluster_size)
 
+    [*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
     Bit  0 -  8:    Reserved (set to 0)
@@ -543,7 +553,8 @@ Standard Cluster Descriptor:
                     nor is data read from the backing file if the cluster is
                     unallocated.
 
-                    With version 2, this is always 0.
+                    With version 2 or with extended L2 entries (see the next
+                    section), this is always 0.
 
          1 -  8:    Reserved (set to 0)
 
@@ -580,6 +591,57 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated the same as in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+    Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+                    1: the subcluster is allocated. In this case the
+                       host cluster offset field must contain a valid
+                       offset.
+                    0: the subcluster is not allocated. In this case
+                       read requests shall go to the backing file or
+                       return zeros if there is no backing file data.
+
+                    Bits are assigned starting from the least significant
+                    one (i.e. bit x is used for subcluster x).
+
+        32 -  63    Subcluster reads as zeros (one bit per subcluster)
+
+                    1: the subcluster reads as zeros. In this case the
+                       allocation status bit must be unset. The host
+                       cluster offset field may or may not be set.
+                    0: no effect.
+
+                    Bits are assigned starting from the least significant
+                    one (i.e. bit x is used for subcluster x - 32).
+
+Subcluster Allocation Bitmap (for compressed clusters):
+
+    Bit  0 -  63:   Reserved (set to 0)
+                    Compressed clusters don't have subclusters,
+                    so this field is not used.
 
 == Snapshots ==
 
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index d57f409861..5f763aa6bb 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -1,6 +1,6 @@
 qcow2 L2/refcount cache configuration
 =====================================
-Copyright (C) 2015, 2018 Igalia, S.L.
+Copyright (C) 2015, 2018-2020 Igalia, S.L.
 Author: Alberto Garcia <berto@igalia.com>
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -222,3 +222,20 @@ support this functionality, and is 0 (disabled) on other platforms.
 This functionality currently relies on the MADV_DONTNEED argument for
 madvise() to actually free the memory. This is a Linux-specific feature,
 so cache-clean-interval is not supported on other systems.
+
+
+Extended L2 Entries
+-------------------
+All numbers shown in this document are valid for qcow2 images with normal
+64-bit L2 entries.
+
+Images with extended L2 entries need twice as much L2 metadata, so the L2
+cache size must be twice as large for the same disk space.
+
+   disk_size = l2_cache_size * cluster_size / 16
+
+i.e.
+
+   l2_cache_size = disk_size * 16 / cluster_size
+
+Refcount blocks are not affected by this.
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 08/30] qcow2: Add dummy has_subclusters() function
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (6 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-10  9:11   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This function will be used by the qcow2 code to check if an image has
subclusters or not.

At the moment this simply returns false. Once all patches needed for
subcluster support are ready then QEMU will be able to create and
read images with subclusters and this function will return the actual
value.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 7754d9bd02..55298750bd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -495,6 +495,12 @@ typedef enum QCow2MetadataOverlap {
 
 #define INV_OFFSET (-1ULL)
 
+static inline bool has_subclusters(BDRVQcow2State *s)
+{
+    /* FIXME: Return false until this feature is complete */
+    return false;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (7 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 08/30] qcow2: Add dummy has_subclusters() function Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-08 11:12   ` Max Reitz
  2020-04-10  9:45   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 10/30] qcow2: Add offset_to_sc_index() Alberto Garcia
                   ` (21 subsequent siblings)
  30 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This patch adds the following new fields to BDRVQcow2State:

- subclusters_per_cluster: Number of subclusters in a cluster
- subcluster_size: The size of each subcluster, in bytes
- subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size

Images without subclusters are treated as if they had exactly one,
with subcluster_size = cluster_size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 5 +++++
 block/qcow2.c | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 55298750bd..3052b14dc0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -78,6 +78,8 @@
 /* The cluster reads as all zeros */
 #define QCOW_OFLAG_ZERO (1ULL << 0)
 
+#define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -284,6 +286,9 @@ typedef struct BDRVQcow2State {
     int cluster_bits;
     int cluster_size;
     int l2_slice_size;
+    int subcluster_bits;
+    int subcluster_size;
+    int subclusters_per_cluster;
     int l2_bits;
     int l2_size;
     int l1_size;
diff --git a/block/qcow2.c b/block/qcow2.c
index 5b6ceaa2fa..239e0ad3d9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1380,6 +1380,11 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         }
     }
 
+    s->subclusters_per_cluster =
+        has_subclusters(s) ? QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER : 1;
+    s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
+    s->subcluster_bits = ctz32(s->subcluster_size);
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 10/30] qcow2: Add offset_to_sc_index()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (8 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-13 11:02   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 11/30] qcow2: Add l2_entry_size() Alberto Garcia
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

For a given offset, return the subcluster number within its cluster
(i.e. with 32 subclusters per cluster it returns a number between 0
and 31).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 3052b14dc0..06929072d2 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -566,6 +566,11 @@ static inline int offset_to_l2_slice_index(BDRVQcow2State *s, int64_t offset)
     return (offset >> s->cluster_bits) & (s->l2_slice_size - 1);
 }
 
+static inline int offset_to_sc_index(BDRVQcow2State *s, int64_t offset)
+{
+    return (offset >> s->subcluster_bits) & (s->subclusters_per_cluster - 1);
+}
+
 static inline int64_t qcow2_vm_state_offset(BDRVQcow2State *s)
 {
     return (int64_t)s->l1_vm_state_index << (s->cluster_bits + s->l2_bits);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (9 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 10/30] qcow2: Add offset_to_sc_index() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-14  9:44   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 12/30] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

qcow2 images with subclusters have 128-bit L2 entries. The first 64
bits contain the same information as traditional images and the last
64 bits form a bitmap with the status of each individual subcluster.

Because of that we cannot assume that L2 entries are sizeof(uint64_t)
anymore. This function returns the proper value for the image.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h          |  9 +++++++++
 block/qcow2-cluster.c  | 12 ++++++------
 block/qcow2-refcount.c | 14 ++++++++------
 block/qcow2.c          |  8 ++++----
 4 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 06929072d2..1eb4b46807 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,10 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* Size of normal and extended L2 entries */
+#define L2E_SIZE_NORMAL   (sizeof(uint64_t))
+#define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
+
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
@@ -506,6 +510,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
     return false;
 }
 
+static inline size_t l2_entry_size(BDRVQcow2State *s)
+{
+    return has_subclusters(s) ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
+}
+
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index cd48ab0223..41a23c5305 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -208,7 +208,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
                    uint64_t l2_offset, uint64_t **l2_slice)
 {
     BDRVQcow2State *s = bs->opaque;
-    int start_of_slice = sizeof(uint64_t) *
+    int start_of_slice = l2_entry_size(s) *
         (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
 
     return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
@@ -281,7 +281,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new l2 entry */
 
-    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
+    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
     if (l2_offset < 0) {
         ret = l2_offset;
         goto fail;
@@ -305,7 +305,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
 
     /* allocate a new entry in the l2 cache */
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
@@ -369,7 +369,7 @@ fail:
     }
     s->l1_table[l1_index] = old_l2_offset;
     if (l2_offset > 0) {
-        qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+        qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                             QCOW2_DISCARD_ALWAYS);
     }
     return ret;
@@ -718,7 +718,7 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
 
         /* Then decrease the refcount of the old table */
         if (l2_offset) {
-            qcow2_free_clusters(bs, l2_offset, s->l2_size * sizeof(uint64_t),
+            qcow2_free_clusters(bs, l2_offset, s->l2_size * l2_entry_size(s),
                                 QCOW2_DISCARD_OTHER);
         }
 
@@ -1919,7 +1919,7 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
     int ret;
     int i, j;
 
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     if (!is_active_l1) {
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 141e4fdcb1..3b89a97fd0 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1254,7 +1254,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     l2_slice = NULL;
     l1_table = NULL;
     l1_size2 = l1_size * sizeof(uint64_t);
-    slice_size2 = s->l2_slice_size * sizeof(uint64_t);
+    slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
     s->cache_discards = true;
@@ -1605,7 +1605,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
     int i, l2_size, nb_csectors, ret;
 
     /* Read L2 table from disk */
-    l2_size = s->l2_size * sizeof(uint64_t);
+    l2_size = s->l2_size * l2_entry_size(s);
     l2_table = g_malloc(l2_size);
 
     ret = bdrv_pread(bs->file, l2_offset, l2_table, l2_size);
@@ -1680,15 +1680,16 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                             fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR",
                             offset);
                     if (fix & BDRV_FIX_ERRORS) {
+                        int idx = i * (l2_entry_size(s) / sizeof(uint64_t));
                         uint64_t l2e_offset =
-                            l2_offset + (uint64_t)i * sizeof(uint64_t);
+                            l2_offset + (uint64_t)i * l2_entry_size(s);
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
                         l2_entry = QCOW_OFLAG_ZERO;
                         set_l2_entry(s, l2_table, i, l2_entry);
                         ret = qcow2_pre_write_overlap_check(bs, ign,
-                                l2e_offset, sizeof(uint64_t), false);
+                                l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Overlap check failed\n");
                             res->check_errors++;
@@ -1698,7 +1699,8 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         }
 
                         ret = bdrv_pwrite_sync(bs->file, l2e_offset,
-                                               &l2_table[i], sizeof(uint64_t));
+                                               &l2_table[idx],
+                                               l2_entry_size(s));
                         if (ret < 0) {
                             fprintf(stderr, "ERROR: Failed to overwrite L2 "
                                     "table entry: %s\n", strerror(-ret));
@@ -1905,7 +1907,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         ret = bdrv_pread(bs->file, l2_offset, l2_table,
-                         s->l2_size * sizeof(uint64_t));
+                         s->l2_size * l2_entry_size(s));
         if (ret < 0) {
             fprintf(stderr, "ERROR: Could not read L2 table: %s\n",
                     strerror(-ret));
diff --git a/block/qcow2.c b/block/qcow2.c
index 239e0ad3d9..d3b8581aed 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -870,7 +870,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     uint64_t max_l2_entries = DIV_ROUND_UP(virtual_disk_size, s->cluster_size);
     /* An L2 table is always one cluster in size so the max cache size
      * should be a multiple of the cluster size. */
-    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * sizeof(uint64_t),
+    uint64_t max_l2_cache = ROUND_UP(max_l2_entries * l2_entry_size(s),
                                      s->cluster_size);
 
     combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
@@ -1031,7 +1031,7 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
         }
     }
 
-    r->l2_slice_size = l2_cache_entry_size / sizeof(uint64_t);
+    r->l2_slice_size = l2_cache_entry_size / l2_entry_size(s);
     r->l2_table_cache = qcow2_cache_create(bs, l2_cache_size,
                                            l2_cache_entry_size);
     r->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size,
@@ -1425,7 +1425,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
         bs->encrypted = true;
     }
 
-    s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
+    s->l2_bits = s->cluster_bits - ctz32(l2_entry_size(s));
     s->l2_size = 1 << s->l2_bits;
     /* 2^(s->refcount_order - 3) is the refcount width in bytes */
     s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
@@ -4104,7 +4104,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
          *  preallocation. All that matters is that we will not have to allocate
          *  new refcount structures for them.) */
         nb_new_l2_tables = DIV_ROUND_UP(nb_new_data_clusters,
-                                        s->cluster_size / sizeof(uint64_t));
+                                        s->cluster_size / l2_entry_size(s));
         /* The cluster range may not be aligned to L2 boundaries, so add one L2
          * table for a potential head/tail */
         nb_new_l2_tables++;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 12/30] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (10 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 11/30] qcow2: Add l2_entry_size() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-14  9:49   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
64 bits for the subcluster allocation bitmap.

In order to support them correctly get/set_l2_entry() need to be
updated so they take the entry width into account in order to
calculate the correct offset.

This patch also adds the get/set_l2_bitmap() functions that are
used to access the bitmaps. For convenience we allow calling
get_l2_bitmap() on images without subclusters, although the caller
does not need and should ignore the returned value.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 1eb4b46807..9611efbc52 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -518,15 +518,37 @@ static inline size_t l2_entry_size(BDRVQcow2State *s)
 static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                     int idx)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     return be64_to_cpu(l2_slice[idx]);
 }
 
+static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                     int idx)
+{
+    if (has_subclusters(s)) {
+        idx *= l2_entry_size(s) / sizeof(uint64_t);
+        return be64_to_cpu(l2_slice[idx + 1]);
+    } else {
+        /* For convenience only; the caller should ignore this value. */
+        return 0;
+    }
+}
+
 static inline void set_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
                                 int idx, uint64_t entry)
 {
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
     l2_slice[idx] = cpu_to_be64(entry);
 }
 
+static inline void set_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
+                                 int idx, uint64_t bitmap)
+{
+    assert(has_subclusters(s));
+    idx *= l2_entry_size(s) / sizeof(uint64_t);
+    l2_slice[idx + 1] = cpu_to_be64(bitmap);
+}
+
 static inline bool has_data_file(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (11 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 12/30] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-08 11:23   ` Max Reitz
  2020-04-14 11:10   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset() Alberto Garcia
                   ` (17 subsequent siblings)
  30 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This patch adds QCow2SubclusterType, which is the subcluster-level
version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
the same meaning as their QCOW2_CLUSTER_* equivalents (when they
exist). See below for details and caveats.

In images without extended L2 entries clusters are treated as having
exactly one subcluster so it is possible to replace one data type with
the other while keeping the exact same semantics.

With extended L2 entries there are new possible values, and every
subcluster in the same cluster can obviously have a different
QCow2SubclusterType so functions need to be adapted to work on the
subcluster level.

There are several things that have to be taken into account:

  a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
     compressed. We do not support compression at the subcluster
     level.

  b) There are two different values for unallocated subclusters:
     QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
     cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
     which means that the cluster is allocated but the subcluster is
     not. The latter can only happen in images with extended L2
     entries.

  c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
     entry has a value that violates the specification. The caller is
     responsible for handling these situations.

     To prevent compatibility problems with images that have invalid
     values but are currently being read by QEMU without causing side
     effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
     with extended L2 entries.

qcow2_cluster_to_subcluster_type() is added as a separate function
from qcow2_get_subcluster_type(), but this is only temporary and both
will be merged in a subsequent patch.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9611efbc52..52865787ee 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -80,6 +80,15 @@
 
 #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
 
+/* The subcluster X [0..31] reads as zeroes */
+#define QCOW_OFLAG_SUB_ZERO(X)    ((1ULL << 32) << (X))
+/* The subcluster X [0..31] is allocated */
+#define QCOW_OFLAG_SUB_ALLOC(X)   (1ULL << (X))
+/* L2 entry bitmap with all "read as zeroes" bits set */
+#define QCOW_L2_BITMAP_ALL_ZEROES 0xFFFFFFFF00000000ULL
+/* L2 entry bitmap with all allocation bits set */
+#define QCOW_L2_BITMAP_ALL_ALLOC  0x00000000FFFFFFFFULL
+
 /* Size of normal and extended L2 entries */
 #define L2E_SIZE_NORMAL   (sizeof(uint64_t))
 #define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
@@ -447,6 +456,33 @@ typedef struct QCowL2Meta
     QLIST_ENTRY(QCowL2Meta) next_in_flight;
 } QCowL2Meta;
 
+/*
+ * In images with standard L2 entries all clusters are treated as if
+ * they had one subcluster so QCow2ClusterType and QCow2SubclusterType
+ * can be mapped to each other and have the exact same meaning
+ * (QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC cannot happen in these images).
+ *
+ * In images with extended L2 entries QCow2ClusterType refers to the
+ * complete cluster and QCow2SubclusterType to each of the individual
+ * subclusters, so there are several possible combinations:
+ *
+ *     |--------------+---------------------------|
+ *     | Cluster type | Possible subcluster types |
+ *     |--------------+---------------------------|
+ *     | UNALLOCATED  |         UNALLOCATED_PLAIN |
+ *     |              |                ZERO_PLAIN |
+ *     |--------------+---------------------------|
+ *     | NORMAL       |         UNALLOCATED_ALLOC |
+ *     |              |                ZERO_ALLOC |
+ *     |              |                    NORMAL |
+ *     |--------------+---------------------------|
+ *     | COMPRESSED   |                COMPRESSED |
+ *     |--------------+---------------------------|
+ *
+ * QCOW2_SUBCLUSTER_INVALID means that the L2 entry is incorrect and
+ * the image should be marked corrupt.
+ */
+
 typedef enum QCow2ClusterType {
     QCOW2_CLUSTER_UNALLOCATED,
     QCOW2_CLUSTER_ZERO_PLAIN,
@@ -455,6 +491,16 @@ typedef enum QCow2ClusterType {
     QCOW2_CLUSTER_COMPRESSED,
 } QCow2ClusterType;
 
+typedef enum QCow2SubclusterType {
+    QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN,
+    QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC,
+    QCOW2_SUBCLUSTER_ZERO_PLAIN,
+    QCOW2_SUBCLUSTER_ZERO_ALLOC,
+    QCOW2_SUBCLUSTER_NORMAL,
+    QCOW2_SUBCLUSTER_COMPRESSED,
+    QCOW2_SUBCLUSTER_INVALID,
+} QCow2SubclusterType;
+
 typedef enum QCow2MetadataOverlap {
     QCOW2_OL_MAIN_HEADER_BITNR      = 0,
     QCOW2_OL_ACTIVE_L1_BITNR        = 1,
@@ -632,6 +678,80 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs,
     }
 }
 
+/*
+ * For an image without extended L2 entries, return the
+ * QCow2SubclusterType equivalent of a given QCow2ClusterType.
+ */
+static inline
+QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
+{
+    switch (type) {
+    case QCOW2_CLUSTER_COMPRESSED:
+        return QCOW2_SUBCLUSTER_COMPRESSED;
+    case QCOW2_CLUSTER_ZERO_PLAIN:
+        return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+    case QCOW2_CLUSTER_NORMAL:
+        return QCOW2_SUBCLUSTER_NORMAL;
+    case QCOW2_CLUSTER_UNALLOCATED:
+        return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+/*
+ * In an image without subsclusters @l2_bitmap is ignored and
+ * @sc_index must be 0.
+ */
+static inline
+QCow2SubclusterType qcow2_get_subcluster_type(BlockDriverState *bs,
+                                              uint64_t l2_entry,
+                                              uint64_t l2_bitmap,
+                                              unsigned sc_index)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
+    assert(sc_index < s->subclusters_per_cluster);
+
+    if (has_subclusters(s)) {
+        bool sc_zero  = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
+        bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
+        switch (type) {
+        case QCOW2_CLUSTER_COMPRESSED:
+            return QCOW2_SUBCLUSTER_COMPRESSED;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+            return QCOW2_SUBCLUSTER_INVALID;
+        case QCOW2_CLUSTER_NORMAL:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_SUBCLUSTER_NORMAL;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_SUBCLUSTER_INVALID;
+            }
+        case QCOW2_CLUSTER_UNALLOCATED:
+            if (!sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+            } else if (!sc_zero && sc_alloc) {
+                return QCOW2_SUBCLUSTER_INVALID;
+            } else if (sc_zero && !sc_alloc) {
+                return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+            } else { /* sc_zero && sc_alloc */
+                return QCOW2_SUBCLUSTER_INVALID;
+            }
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        return qcow2_cluster_to_subcluster_type(type);
+    }
+}
+
 /* Check whether refcounts are eager or lazy */
 static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (12 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-08 12:15   ` Max Reitz
  2020-04-14 12:30   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
                   ` (16 subsequent siblings)
  30 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This function returns an integer that can be either an error code or a
cluster type (a value from the QCow2ClusterType enum).

We are going to start using subcluster types instead of cluster types
in some functions so it's better to use the exact data types instead
of integers for clarity and in order to detect errors more easily.

This patch makes qcow2_get_host_offset() return 0 on success and
puts the returned cluster type in a separate parameter. There are no
semantic changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h         |  3 ++-
 block/qcow2-cluster.c | 11 +++++++----
 block/qcow2.c         | 37 ++++++++++++++++++++++---------------
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 52865787ee..6b7b286b91 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -859,7 +859,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
                           uint8_t *buf, int nb_sectors, bool enc, Error **errp);
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-                          unsigned int *bytes, uint64_t *host_offset);
+                          unsigned int *bytes, uint64_t *host_offset,
+                          QCow2ClusterType *cluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
                                unsigned int *bytes, uint64_t *host_offset,
                                QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 41a23c5305..acfcf8ea4c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -514,13 +514,14 @@ static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
  * cluster type and (if applicable) are stored contiguously in the image file.
+ * The cluster type is stored in *cluster_type.
  * Compressed clusters are always returned one by one.
  *
- * Returns the cluster type (QCOW2_CLUSTER_*) on success, -errno in error
- * cases.
+ * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
-                          unsigned int *bytes, uint64_t *host_offset)
+                          unsigned int *bytes, uint64_t *host_offset,
+                          QCow2ClusterType *cluster_type)
 {
     BDRVQcow2State *s = bs->opaque;
     unsigned int l2_index;
@@ -663,7 +664,9 @@ out:
     assert(bytes_available - offset_in_cluster <= UINT_MAX);
     *bytes = bytes_available - offset_in_cluster;
 
-    return type;
+    *cluster_type = type;
+
+    return 0;
 
 fail:
     qcow2_cache_put(s->l2_table_cache, (void **)&l2_slice);
diff --git a/block/qcow2.c b/block/qcow2.c
index d3b8581aed..48e188152c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1971,6 +1971,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     BDRVQcow2State *s = bs->opaque;
     uint64_t host_offset;
     unsigned int bytes;
+    QCow2ClusterType type;
     int ret, status = 0;
 
     qemu_co_mutex_lock(&s->lock);
@@ -1982,7 +1983,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     }
 
     bytes = MIN(INT_MAX, count);
-    ret = qcow2_get_host_offset(bs, offset, &bytes, &host_offset);
+    ret = qcow2_get_host_offset(bs, offset, &bytes, &host_offset, &type);
     qemu_co_mutex_unlock(&s->lock);
     if (ret < 0) {
         return ret;
@@ -1990,15 +1991,15 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
 
     *pnum = bytes;
 
-    if ((ret == QCOW2_CLUSTER_NORMAL || ret == QCOW2_CLUSTER_ZERO_ALLOC) &&
+    if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
         !s->crypto) {
         *map = host_offset;
         *file = s->data_file->bs;
         status |= BDRV_BLOCK_OFFSET_VALID;
     }
-    if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
+    if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
-    } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+    } else if (type != QCOW2_CLUSTER_UNALLOCATED) {
         status |= BDRV_BLOCK_DATA;
     }
     if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2207,6 +2208,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
     int ret = 0;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     uint64_t host_offset = 0;
+    QCow2ClusterType type;
     AioTaskPool *aio = NULL;
 
     while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2218,22 +2220,23 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         }
 
         qemu_co_mutex_lock(&s->lock);
-        ret = qcow2_get_host_offset(bs, offset, &cur_bytes, &host_offset);
+        ret = qcow2_get_host_offset(bs, offset, &cur_bytes,
+                                    &host_offset, &type);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
             goto out;
         }
 
-        if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
-            ret == QCOW2_CLUSTER_ZERO_ALLOC ||
-            (ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
+        if (type == QCOW2_CLUSTER_ZERO_PLAIN ||
+            type == QCOW2_CLUSTER_ZERO_ALLOC ||
+            (type == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
             if (!aio && cur_bytes != bytes) {
                 aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
             }
-            ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, ret,
+            ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, type,
                                  host_offset, offset, cur_bytes,
                                  qiov, qiov_offset, NULL);
             if (ret < 0) {
@@ -3716,6 +3719,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     if (head || tail) {
         uint64_t off;
         unsigned int nr;
+        QCow2ClusterType type;
 
         assert(head + bytes <= s->cluster_size);
 
@@ -3731,10 +3735,11 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
         bytes = s->cluster_size;
         nr = s->cluster_size;
-        ret = qcow2_get_host_offset(bs, offset, &nr, &off);
-        if (ret != QCOW2_CLUSTER_UNALLOCATED &&
-            ret != QCOW2_CLUSTER_ZERO_PLAIN &&
-            ret != QCOW2_CLUSTER_ZERO_ALLOC) {
+        ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);
+        if (ret < 0 ||
+            (type != QCOW2_CLUSTER_UNALLOCATED &&
+             type != QCOW2_CLUSTER_ZERO_PLAIN &&
+             type != QCOW2_CLUSTER_ZERO_ALLOC)) {
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
@@ -3792,16 +3797,18 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
     while (bytes != 0) {
         uint64_t copy_offset = 0;
+        QCow2ClusterType type;
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
         cur_write_flags = write_flags;
 
-        ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes, &copy_offset);
+        ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes,
+                                    &copy_offset, &type);
         if (ret < 0) {
             goto out;
         }
 
-        switch (ret) {
+        switch (type) {
         case QCOW2_CLUSTER_UNALLOCATED:
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (13 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-08 12:42   ` Max Reitz
  2020-04-15  7:10   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 16/30] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
                   ` (15 subsequent siblings)
  30 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

In order to support extended L2 entries some functions of the qcow2
driver need to start dealing with subclusters instead of clusters.

qcow2_get_host_offset() is modified to return the subcluster type
instead of the cluster type, and all callers are updated to replace
all values of QCow2ClusterType with their QCow2SubclusterType
equivalents.

This patch only changes the data types, there are no semantic changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h         |  2 +-
 block/qcow2-cluster.c | 10 +++----
 block/qcow2.c         | 70 ++++++++++++++++++++++---------------------
 3 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 6b7b286b91..e6fbb7d987 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -860,7 +860,7 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
 
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
                           unsigned int *bytes, uint64_t *host_offset,
-                          QCow2ClusterType *cluster_type);
+                          QCow2SubclusterType *subcluster_type);
 int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
                                unsigned int *bytes, uint64_t *host_offset,
                                QCowL2Meta **m);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index acfcf8ea4c..8cdf8a23b6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -513,15 +513,15 @@ static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
  * offset that we are interested in.
  *
  * On exit, *bytes is the number of bytes starting at offset that have the same
- * cluster type and (if applicable) are stored contiguously in the image file.
- * The cluster type is stored in *cluster_type.
- * Compressed clusters are always returned one by one.
+ * subcluster type and (if applicable) are stored contiguously in the image
+ * file. The subcluster type is stored in *subcluster_type.
+ * Compressed clusters are always processed one by one.
  *
  * Returns 0 on success, -errno in error cases.
  */
 int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
                           unsigned int *bytes, uint64_t *host_offset,
-                          QCow2ClusterType *cluster_type)
+                          QCow2SubclusterType *subcluster_type)
 {
     BDRVQcow2State *s = bs->opaque;
     unsigned int l2_index;
@@ -664,7 +664,7 @@ out:
     assert(bytes_available - offset_in_cluster <= UINT_MAX);
     *bytes = bytes_available - offset_in_cluster;
 
-    *cluster_type = type;
+    *subcluster_type = qcow2_cluster_to_subcluster_type(type);
 
     return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 48e188152c..f8788d6305 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1971,7 +1971,7 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     BDRVQcow2State *s = bs->opaque;
     uint64_t host_offset;
     unsigned int bytes;
-    QCow2ClusterType type;
+    QCow2SubclusterType type;
     int ret, status = 0;
 
     qemu_co_mutex_lock(&s->lock);
@@ -1991,15 +1991,16 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
 
     *pnum = bytes;
 
-    if ((type == QCOW2_CLUSTER_NORMAL || type == QCOW2_CLUSTER_ZERO_ALLOC) &&
-        !s->crypto) {
+    if ((type == QCOW2_SUBCLUSTER_NORMAL ||
+         type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
         *map = host_offset;
         *file = s->data_file->bs;
         status |= BDRV_BLOCK_OFFSET_VALID;
     }
-    if (type == QCOW2_CLUSTER_ZERO_PLAIN || type == QCOW2_CLUSTER_ZERO_ALLOC) {
+    if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+        type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
-    } else if (type != QCOW2_CLUSTER_UNALLOCATED) {
+    } else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
         status |= BDRV_BLOCK_DATA;
     }
     if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2096,7 +2097,7 @@ typedef struct Qcow2AioTask {
     AioTask task;
 
     BlockDriverState *bs;
-    QCow2ClusterType cluster_type; /* only for read */
+    QCow2SubclusterType subcluster_type; /* only for read */
     uint64_t host_offset; /* or full descriptor in compressed clusters */
     uint64_t offset;
     uint64_t bytes;
@@ -2109,7 +2110,7 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task);
 static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
                                        AioTaskPool *pool,
                                        AioTaskFunc func,
-                                       QCow2ClusterType cluster_type,
+                                       QCow2SubclusterType subcluster_type,
                                        uint64_t host_offset,
                                        uint64_t offset,
                                        uint64_t bytes,
@@ -2123,7 +2124,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
     *task = (Qcow2AioTask) {
         .task.func = func,
         .bs = bs,
-        .cluster_type = cluster_type,
+        .subcluster_type = subcluster_type,
         .qiov = qiov,
         .host_offset = host_offset,
         .offset = offset,
@@ -2134,7 +2135,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
 
     trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
                          func == qcow2_co_preadv_task_entry ? "read" : "write",
-                         cluster_type, host_offset, offset, bytes,
+                         subcluster_type, host_offset, offset, bytes,
                          qiov, qiov_offset);
 
     if (!pool) {
@@ -2147,7 +2148,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
 }
 
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
-                                             QCow2ClusterType cluster_type,
+                                             QCow2SubclusterType subc_type,
                                              uint64_t host_offset,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
@@ -2155,24 +2156,24 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
 {
     BDRVQcow2State *s = bs->opaque;
 
-    switch (cluster_type) {
-    case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_ZERO_ALLOC:
+    switch (subc_type) {
+    case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+    case QCOW2_SUBCLUSTER_ZERO_ALLOC:
         /* Both zero types are handled in qcow2_co_preadv_part */
         g_assert_not_reached();
 
-    case QCOW2_CLUSTER_UNALLOCATED:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
         assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
         BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
         return bdrv_co_preadv_part(bs->backing, offset, bytes,
                                    qiov, qiov_offset, 0);
 
-    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_SUBCLUSTER_COMPRESSED:
         return qcow2_co_preadv_compressed(bs, host_offset,
                                           offset, bytes, qiov, qiov_offset);
 
-    case QCOW2_CLUSTER_NORMAL:
+    case QCOW2_SUBCLUSTER_NORMAL:
         if (bs->encrypted) {
             return qcow2_co_preadv_encrypted(bs, host_offset,
                                              offset, bytes, qiov, qiov_offset);
@@ -2195,8 +2196,9 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
 
     assert(!t->l2meta);
 
-    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->host_offset,
-                                t->offset, t->bytes, t->qiov, t->qiov_offset);
+    return qcow2_co_preadv_task(t->bs, t->subcluster_type,
+                                t->host_offset, t->offset, t->bytes,
+                                t->qiov, t->qiov_offset);
 }
 
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
@@ -2208,7 +2210,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
     int ret = 0;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     uint64_t host_offset = 0;
-    QCow2ClusterType type;
+    QCow2SubclusterType type;
     AioTaskPool *aio = NULL;
 
     while (bytes != 0 && aio_task_pool_status(aio) == 0) {
@@ -2227,9 +2229,9 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
             goto out;
         }
 
-        if (type == QCOW2_CLUSTER_ZERO_PLAIN ||
-            type == QCOW2_CLUSTER_ZERO_ALLOC ||
-            (type == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
+        if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+            type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+            (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
@@ -2463,7 +2465,7 @@ static coroutine_fn int qcow2_co_pwritev_task_entry(AioTask *task)
 {
     Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
 
-    assert(!t->cluster_type);
+    assert(!t->subcluster_type);
 
     return qcow2_co_pwritev_task(t->bs, t->host_offset,
                                  t->offset, t->bytes, t->qiov, t->qiov_offset,
@@ -3719,7 +3721,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     if (head || tail) {
         uint64_t off;
         unsigned int nr;
-        QCow2ClusterType type;
+        QCow2SubclusterType type;
 
         assert(head + bytes <= s->cluster_size);
 
@@ -3737,9 +3739,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         nr = s->cluster_size;
         ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);
         if (ret < 0 ||
-            (type != QCOW2_CLUSTER_UNALLOCATED &&
-             type != QCOW2_CLUSTER_ZERO_PLAIN &&
-             type != QCOW2_CLUSTER_ZERO_ALLOC)) {
+            (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+             type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
+             type != QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
@@ -3797,7 +3799,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
     while (bytes != 0) {
         uint64_t copy_offset = 0;
-        QCow2ClusterType type;
+        QCow2SubclusterType type;
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
         cur_write_flags = write_flags;
@@ -3809,7 +3811,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
         }
 
         switch (type) {
-        case QCOW2_CLUSTER_UNALLOCATED:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
@@ -3824,16 +3826,16 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
             }
             break;
 
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
             cur_write_flags |= BDRV_REQ_ZERO_WRITE;
             break;
 
-        case QCOW2_CLUSTER_COMPRESSED:
+        case QCOW2_SUBCLUSTER_COMPRESSED:
             ret = -ENOTSUP;
             goto out;
 
-        case QCOW2_CLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_NORMAL:
             child = s->data_file;
             break;
 
@@ -4289,7 +4291,7 @@ static coroutine_fn int qcow2_co_pwritev_compressed_task_entry(AioTask *task)
 {
     Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
 
-    assert(!t->cluster_type && !t->l2meta);
+    assert(!t->subcluster_type && !t->l2meta);
 
     return qcow2_co_pwritev_compressed_task(t->bs, t->offset, t->bytes, t->qiov,
                                             t->qiov_offset);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 16/30] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (14 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-15  7:28   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
                   ` (14 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

When dealing with subcluster types there is a new value called
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
QCow2ClusterType.

This patch handles that value in all places where subcluster types
are processed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f8788d6305..88daaf11a0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1992,7 +1992,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     *pnum = bytes;
 
     if ((type == QCOW2_SUBCLUSTER_NORMAL ||
-         type == QCOW2_SUBCLUSTER_ZERO_ALLOC) && !s->crypto) {
+         type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
+         type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) && !s->crypto) {
         *map = host_offset;
         *file = s->data_file->bs;
         status |= BDRV_BLOCK_OFFSET_VALID;
@@ -2000,7 +2001,8 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
         type == QCOW2_SUBCLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
-    } else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN) {
+    } else if (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+               type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC) {
         status |= BDRV_BLOCK_DATA;
     }
     if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
@@ -2163,6 +2165,7 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
         g_assert_not_reached();
 
     case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
         assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
 
         BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
@@ -2231,7 +2234,8 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
 
         if (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
             type == QCOW2_SUBCLUSTER_ZERO_ALLOC ||
-            (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing))
+            (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN && !bs->backing) ||
+            (type == QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC && !bs->backing))
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
@@ -3740,6 +3744,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);
         if (ret < 0 ||
             (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
+             type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
              type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
              type != QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
             qemu_co_mutex_unlock(&s->lock);
@@ -3812,6 +3817,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
 
         switch (type) {
         case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
             if (bs->backing && bs->backing->bs) {
                 int64_t backing_length = bdrv_getlength(bs->backing->bs);
                 if (src_offset >= backing_length) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (15 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 16/30] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-15  8:39   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset() Alberto Garcia
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

If an image has subclusters then there are more copy-on-write
scenarios that we need to consider. Let's say we have a write request
from the middle of subcluster #3 until the end of the cluster:

   - If the cluster is new, then subclusters #0 to #3 from the old
     cluster must be copied into the new one.

   - If the cluster is new but the old cluster was unallocated, then
     only subcluster #3 needs copy-on-write. #0 to #2 are marked as
     unallocated in the bitmap of the new L2 entry.

   - If we are overwriting an old cluster and subcluster #3 is
     unallocated or has the all-zeroes bit set then we need
     copy-on-write on subcluster #3.

   - If we are overwriting an old cluster and subcluster #3 was
     allocated then there is no need to copy-on-write.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 140 +++++++++++++++++++++++++++++++++---------
 1 file changed, 110 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8cdf8a23b6..c6f3cc9237 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1061,56 +1061,128 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
  * If @keep_old is true it means that the clusters were already
  * allocated and will be overwritten. If false then the clusters are
  * new and we have to decrease the reference count of the old ones.
+ *
+ * Returns 1 on success, -errno on failure (in order to match the
+ * return value of handle_copied() and handle_alloc()).
  */
-static void calculate_l2_meta(BlockDriverState *bs,
-                              uint64_t host_cluster_offset,
-                              uint64_t guest_offset, unsigned bytes,
-                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
+static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_cluster_offset,
+                             uint64_t guest_offset, unsigned bytes,
+                             uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
 {
     BDRVQcow2State *s = bs->opaque;
-    int l2_index = offset_to_l2_slice_index(s, guest_offset);
-    uint64_t l2_entry;
+    int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
+    uint64_t l2_entry, l2_bitmap;
     unsigned cow_start_from, cow_end_to;
     unsigned cow_start_to = offset_into_cluster(s, guest_offset);
     unsigned cow_end_from = cow_start_to + bytes;
     unsigned nb_clusters = size_to_clusters(s, cow_end_from);
     QCowL2Meta *old_m = *m;
-    QCow2ClusterType type;
+    QCow2SubclusterType type;
 
     assert(nb_clusters <= s->l2_slice_size - l2_index);
 
-    /* Return if there's no COW (all clusters are normal and we keep them) */
+    /* Return if there's no COW (all subclusters are normal and we are
+     * keeping the clusters) */
     if (keep_old) {
+        unsigned first_sc = cow_start_to / s->subcluster_size;
+        unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
         int i;
-        for (i = 0; i < nb_clusters; i++) {
-            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
-            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
+        for (i = first_sc; i <= last_sc; i++) {
+            unsigned c = i / s->subclusters_per_cluster;
+            unsigned sc = i % s->subclusters_per_cluster;
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
+            l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
+            type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
+            if (type == QCOW2_SUBCLUSTER_INVALID) {
+                l2_index += c; /* Point to the invalid entry */
+                goto fail;
+            }
+            if (type != QCOW2_SUBCLUSTER_NORMAL) {
                 break;
             }
         }
-        if (i == nb_clusters) {
-            return;
+        if (i == last_sc + 1) {
+            return 1;
         }
     }
 
     /* Get the L2 entry of the first cluster */
     l2_entry = get_l2_entry(s, l2_slice, l2_index);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_start_from = cow_start_to;
+    if (type == QCOW2_SUBCLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_COMPRESSED:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_start_from = 0;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_start_from = 0;
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+            cow_start_from = cow_start_to;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_start_from = sc_index << s->subcluster_bits;
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     /* Get the L2 entry of the last cluster */
-    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
-    type = qcow2_get_cluster_type(bs, l2_entry);
+    l2_index += nb_clusters - 1;
+    l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    sc_index = offset_to_sc_index(s, guest_offset + bytes - 1);
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
 
-    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
-        cow_end_to = cow_end_from;
+    if (type == QCOW2_SUBCLUSTER_INVALID) {
+        goto fail;
+    }
+
+    if (!keep_old) {
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+        case QCOW2_SUBCLUSTER_COMPRESSED:
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     } else {
-        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
+        switch (type) {
+        case QCOW2_SUBCLUSTER_NORMAL:
+            cow_end_to = cow_end_from;
+            break;
+        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
+            break;
+        default:
+            g_assert_not_reached();
+        }
     }
 
     *m = g_malloc0(sizeof(**m));
@@ -1135,6 +1207,18 @@ static void calculate_l2_meta(BlockDriverState *bs,
 
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
+
+fail:
+    if (type == QCOW2_SUBCLUSTER_INVALID) {
+        uint64_t l1_index = offset_to_l1_index(s, guest_offset);
+        uint64_t l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        return -EIO;
+    }
+
+    return 1;
 }
 
 /*
@@ -1352,10 +1436,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
                  - offset_into_cluster(s, guest_offset));
         assert(*bytes != 0);
 
-        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
-                          *bytes, l2_slice, m, true);
-
-        ret = 1;
+        ret = calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK,
+                                guest_offset, *bytes, l2_slice, m, true);
     } else {
         ret = 0;
     }
@@ -1530,10 +1612,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
     assert(*bytes != 0);
 
-    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
-                      m, false);
-
-    ret = 1;
+    ret = calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
+                            l2_slice, m, false);
 
 out:
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (16 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-08 12:49   ` Max Reitz
  2020-04-22  8:07   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 19/30] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
                   ` (12 subsequent siblings)
  30 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The logic of this function remains pretty much the same, except that
it uses count_contiguous_subclusters(), which combines the logic of
count_contiguous_clusters() / count_contiguous_clusters_unallocated()
and checks individual subclusters.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.h         |  38 +++++------
 block/qcow2-cluster.c | 143 +++++++++++++++++++++---------------------
 2 files changed, 85 insertions(+), 96 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index e6fbb7d987..031ce823b3 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -678,29 +678,6 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs,
     }
 }
 
-/*
- * For an image without extended L2 entries, return the
- * QCow2SubclusterType equivalent of a given QCow2ClusterType.
- */
-static inline
-QCow2SubclusterType qcow2_cluster_to_subcluster_type(QCow2ClusterType type)
-{
-    switch (type) {
-    case QCOW2_CLUSTER_COMPRESSED:
-        return QCOW2_SUBCLUSTER_COMPRESSED;
-    case QCOW2_CLUSTER_ZERO_PLAIN:
-        return QCOW2_SUBCLUSTER_ZERO_PLAIN;
-    case QCOW2_CLUSTER_ZERO_ALLOC:
-        return QCOW2_SUBCLUSTER_ZERO_ALLOC;
-    case QCOW2_CLUSTER_NORMAL:
-        return QCOW2_SUBCLUSTER_NORMAL;
-    case QCOW2_CLUSTER_UNALLOCATED:
-        return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
-    default:
-        g_assert_not_reached();
-    }
-}
-
 /*
  * In an image without subsclusters @l2_bitmap is ignored and
  * @sc_index must be 0.
@@ -748,7 +725,20 @@ QCow2SubclusterType qcow2_get_subcluster_type(BlockDriverState *bs,
             g_assert_not_reached();
         }
     } else {
-        return qcow2_cluster_to_subcluster_type(type);
+        switch (type) {
+        case QCOW2_CLUSTER_COMPRESSED:
+            return QCOW2_SUBCLUSTER_COMPRESSED;
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+            return QCOW2_SUBCLUSTER_ZERO_PLAIN;
+        case QCOW2_CLUSTER_ZERO_ALLOC:
+            return QCOW2_SUBCLUSTER_ZERO_ALLOC;
+        case QCOW2_CLUSTER_NORMAL:
+            return QCOW2_SUBCLUSTER_NORMAL;
+        case QCOW2_CLUSTER_UNALLOCATED:
+            return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
+        default:
+            g_assert_not_reached();
+        }
     }
 }
 
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c6f3cc9237..6f2643ba53 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -376,66 +376,58 @@ fail:
 }
 
 /*
- * Checks how many clusters in a given L2 slice are contiguous in the image
- * file. As soon as one of the flags in the bitmask stop_flags changes compared
- * to the first cluster, the search is stopped and the cluster is not counted
- * as contiguous. (This allows it, for example, to stop at the first compressed
- * cluster which may require a different handling)
+ * Return the number of contiguous subclusters of the exact same type
+ * in a given L2 slice, starting from cluster @l2_index, subcluster
+ * @sc_index. Allocated subclusters are required to be contiguous in
+ * the image file.
+ * At most @nb_clusters are checked (note that this means clusters,
+ * not subclusters).
+ * Compressed clusters are always processed one by one but for the
+ * purpose of this count they are treated as if they were divided into
+ * subclusters of size s->subcluster_size.
  */
-static int count_contiguous_clusters(BlockDriverState *bs, int nb_clusters,
-        int cluster_size, uint64_t *l2_slice, int l2_index, uint64_t stop_flags)
+static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
+                                        unsigned sc_index, uint64_t *l2_slice,
+                                        int l2_index)
 {
     BDRVQcow2State *s = bs->opaque;
-    int i;
-    QCow2ClusterType first_cluster_type;
-    uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
-    uint64_t first_entry = get_l2_entry(s, l2_slice, l2_index);
-    uint64_t offset = first_entry & mask;
-
-    first_cluster_type = qcow2_get_cluster_type(bs, first_entry);
-    if (first_cluster_type == QCOW2_CLUSTER_UNALLOCATED) {
-        return 0;
+    int i, j, count = 0;
+    uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
+    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
+    bool check_offset = true;
+    QCow2SubclusterType type =
+        qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+
+    assert(type != QCOW2_SUBCLUSTER_INVALID); /* The caller should check this */
+
+    if (type == QCOW2_SUBCLUSTER_COMPRESSED) {
+        /* Compressed clusters are always processed one by one */
+        return s->subclusters_per_cluster - sc_index;
     }
 
-    /* must be allocated */
-    assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-           first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
-
-    for (i = 0; i < nb_clusters; i++) {
-        uint64_t l2_entry = get_l2_entry(s, l2_slice, l2_index + i) & mask;
-        if (offset + (uint64_t) i * cluster_size != l2_entry) {
-            break;
-        }
+    if (type == QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN ||
+        type == QCOW2_SUBCLUSTER_ZERO_PLAIN) {
+        check_offset = false;
     }
 
-        return i;
-}
-
-/*
- * Checks how many consecutive unallocated clusters in a given L2
- * slice have the same cluster type.
- */
-static int count_contiguous_clusters_unallocated(BlockDriverState *bs,
-                                                 int nb_clusters,
-                                                 uint64_t *l2_slice,
-                                                 int l2_index,
-                                                 QCow2ClusterType wanted_type)
-{
-    BDRVQcow2State *s = bs->opaque;
-    int i;
-
-    assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
-           wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-        QCow2ClusterType type = qcow2_get_cluster_type(bs, entry);
-
-        if (type != wanted_type) {
-            break;
+        l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
+        l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+        if (check_offset && expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
+            goto out;
+        }
+        for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) {
+            if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) {
+                goto out;
+            }
+            count++;
         }
+        expected_offset += s->cluster_size;
     }
 
-    return i;
+out:
+    return count;
 }
 
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
@@ -524,12 +516,12 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
                           QCow2SubclusterType *subcluster_type)
 {
     BDRVQcow2State *s = bs->opaque;
-    unsigned int l2_index;
-    uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
-    int c;
+    unsigned int l2_index, sc_index;
+    uint64_t l1_index, l2_offset, *l2_slice, l2_entry, l2_bitmap;
+    int sc;
     unsigned int offset_in_cluster;
     uint64_t bytes_available, bytes_needed, nb_clusters;
-    QCow2ClusterType type;
+    QCow2SubclusterType type;
     int ret;
 
     offset_in_cluster = offset_into_cluster(s, offset);
@@ -552,13 +544,13 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
 
     l1_index = offset_to_l1_index(s, offset);
     if (l1_index >= s->l1_size) {
-        type = QCOW2_CLUSTER_UNALLOCATED;
+        type = QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
         goto out;
     }
 
     l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
     if (!l2_offset) {
-        type = QCOW2_CLUSTER_UNALLOCATED;
+        type = QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
         goto out;
     }
 
@@ -579,7 +571,9 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
     /* find the cluster offset for the given disk offset */
 
     l2_index = offset_to_l2_slice_index(s, offset);
+    sc_index = offset_to_sc_index(s, offset);
     l2_entry = get_l2_entry(s, l2_slice, l2_index);
+    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
     /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
@@ -587,9 +581,9 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
      * true */
     assert(nb_clusters <= INT_MAX);
 
-    type = qcow2_get_cluster_type(bs, l2_entry);
-    if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
-                                type == QCOW2_CLUSTER_ZERO_ALLOC)) {
+    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
+    if (s->qcow_version < 3 && (type == QCOW2_SUBCLUSTER_ZERO_PLAIN ||
+                                type == QCOW2_SUBCLUSTER_ZERO_ALLOC)) {
         qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
                                 " in pre-v3 image (L2 offset: %#" PRIx64
                                 ", L2 index: %#x)", l2_offset, l2_index);
@@ -597,7 +591,13 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
         goto fail;
     }
     switch (type) {
-    case QCOW2_CLUSTER_COMPRESSED:
+    case QCOW2_SUBCLUSTER_INVALID:
+        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
+                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
+                                l2_offset, l2_index);
+        ret = -EIO;
+        goto fail;
+    case QCOW2_SUBCLUSTER_COMPRESSED:
         if (has_data_file(bs)) {
             qcow2_signal_corruption(bs, true, -1, -1, "Compressed cluster "
                                     "entry found in image with external data "
@@ -607,21 +607,20 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
         /* Compressed clusters can only be processed one by one */
-        c = 1;
+        sc = s->subclusters_per_cluster - sc_index;
         *host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
         break;
-    case QCOW2_CLUSTER_ZERO_PLAIN:
-    case QCOW2_CLUSTER_UNALLOCATED:
-        /* how many empty clusters ? */
-        c = count_contiguous_clusters_unallocated(bs, nb_clusters,
-                                                  l2_slice, l2_index, type);
+    case QCOW2_SUBCLUSTER_ZERO_PLAIN:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
+        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                          l2_slice, l2_index);
         *host_offset = 0;
         break;
-    case QCOW2_CLUSTER_ZERO_ALLOC:
-    case QCOW2_CLUSTER_NORMAL:
-        /* how many allocated clusters ? */
-        c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
-                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
+    case QCOW2_SUBCLUSTER_ZERO_ALLOC:
+    case QCOW2_SUBCLUSTER_NORMAL:
+    case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
+        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
+                                          l2_slice, l2_index);
         *host_offset = l2_entry & L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *host_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1,
@@ -651,7 +650,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
-    bytes_available = (int64_t)c * s->cluster_size;
+    bytes_available = ((int64_t)sc + sc_index) << s->subcluster_bits;
 
 out:
     if (bytes_available > bytes_needed) {
@@ -664,7 +663,7 @@ out:
     assert(bytes_available - offset_in_cluster <= UINT_MAX);
     *bytes = bytes_available - offset_in_cluster;
 
-    *subcluster_type = qcow2_cluster_to_subcluster_type(type);
+    *subcluster_type = type;
 
     return 0;
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 19/30] qcow2: Add subcluster support to zero_in_l2_slice()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (17 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-22 11:06   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
                   ` (11 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 6f2643ba53..746006a117 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1897,7 +1897,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
     assert(nb_clusters <= INT_MAX);
 
     for (i = 0; i < nb_clusters; i++) {
-        uint64_t old_offset;
+        uint64_t old_offset, l2_entry = 0;
         QCow2ClusterType cluster_type;
 
         old_offset = get_l2_entry(s, l2_slice, l2_index + i);
@@ -1914,12 +1914,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
         if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
-            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
-            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
+            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
         }
+
+        if (has_subclusters(s)) {
+            set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES);
+        } else {
+            l2_entry |= QCOW_OFLAG_ZERO;
+        }
+
+        set_l2_entry(s, l2_slice, l2_index + i, l2_entry);
     }
 
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (18 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 19/30] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-09 10:05   ` Max Reitz
  2020-04-22 11:35   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 21/30] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
                   ` (10 subsequent siblings)
  30 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Two changes are needed in this function:

1) A full discard deallocates a cluster so we can skip the operation if
   it is already unallocated. With extended L2 entries however if any
   of the subclusters has the 'all zeroes' bit set then we have to
   clear it.

2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
   image has extended L2 entries. Instead, the individual 'all zeroes'
   bits must be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 746006a117..824c710760 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
          * TODO We might want to use bdrv_block_status(bs) here, but we're
          * holding s->lock, so that doesn't work today.
          *
-         * If full_discard is true, the sector should not read back as zeroes,
+         * If full_discard is true, the cluster should not read back as zeroes,
          * but rather fall through to the backing file.
          */
         switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
         case QCOW2_CLUSTER_UNALLOCATED:
-            if (full_discard || !bs->backing) {
+            if (full_discard) {
+                /* If the image has extended L2 entries we can only
+                 * skip this operation if the L2 bitmap is zero. */
+                uint64_t bitmap = has_subclusters(s) ?
+                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
+                if (bitmap == 0) {
+                    continue;
+                }
+            } else if (!bs->backing) {
                 continue;
             }
             break;
@@ -1817,7 +1825,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-        if (!full_discard && s->qcow_version >= 3) {
+        if (has_subclusters(s)) {
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
+            set_l2_bitmap(s, l2_slice, l2_index + i,
+                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
+        } else if (!full_discard && s->qcow_version >= 3) {
             set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
             set_l2_entry(s, l2_slice, l2_index + i, 0);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 21/30] qcow2: Add subcluster support to check_refcounts_l2()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (19 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-22 12:06   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 22/30] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
                   ` (9 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
image has subclusters. Instead, the individual 'all zeroes' bits must
be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 3b89a97fd0..9337496c84 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1686,8 +1686,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
                         int ign = active ? QCOW2_OL_ACTIVE_L2 :
                                            QCOW2_OL_INACTIVE_L2;
 
-                        l2_entry = QCOW_OFLAG_ZERO;
-                        set_l2_entry(s, l2_table, i, l2_entry);
+                        if (has_subclusters(s)) {
+                            set_l2_entry(s, l2_table, i, 0);
+                            set_l2_bitmap(s, l2_table, i,
+                                          QCOW_L2_BITMAP_ALL_ZEROES);
+                        } else {
+                            set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
+                        }
                         ret = qcow2_pre_write_overlap_check(bs, ign,
                                 l2e_offset, l2_entry_size(s), false);
                         if (ret < 0) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 22/30] qcow2: Fix offset calculation in handle_dependencies()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (20 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 21/30] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-22 12:38   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
                   ` (8 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

l2meta_cow_start() and l2meta_cow_end() are not necessarily
cluster-aligned if the image has subclusters, so update the
calculation of old_start and old_end to guarantee that no two requests
try to write on the same cluster.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 824c710760..ceacd91ea3 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1306,8 +1306,8 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 
         uint64_t start = guest_offset;
         uint64_t end = start + bytes;
-        uint64_t old_start = l2meta_cow_start(old_alloc);
-        uint64_t old_end = l2meta_cow_end(old_alloc);
+        uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
+        uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), s->cluster_size);
 
         if (end <= old_start || start >= old_end) {
             /* No intersection */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (21 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 22/30] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-24 19:39   ` Eric Blake
  2020-03-17 18:16 ` [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
                   ` (7 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The L2 bitmap needs to be updated after each write to indicate what
new subclusters are now allocated.

This needs to happen even if the cluster was already allocated and the
L2 entry was otherwise valid.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index ceacd91ea3..dfd8b66958 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1006,6 +1006,23 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
         assert((offset & L2E_OFFSET_MASK) == offset);
 
         set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
+
+        /* Update bitmap with the subclusters that were just written */
+        if (has_subclusters(s)) {
+            unsigned written_from = m->cow_start.offset;
+            unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
+                m->nb_clusters << s->cluster_bits;
+            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
+            int sc;
+            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
+                int sc_off = i * s->cluster_size + sc * s->subcluster_size;
+                if (sc_off >= written_from && sc_off < written_to) {
+                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
+                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
+                }
+            }
+            set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);
+        }
      }
 
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (22 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-24 17:02   ` Alberto Garcia
  2020-03-17 18:16 ` [PATCH v4 25/30] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
                   ` (6 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Compressed clusters always have the bitmap part of the extended L2
entry set to 0.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index dfd8b66958..1f471db98c 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -806,6 +806,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
     qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
     set_l2_entry(s, l2_slice, l2_index, cluster_offset);
+    if (has_subclusters(s)) {
+        set_l2_bitmap(s, l2_slice, l2_index, 0);
+    }
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 25/30] qcow2: Add subcluster support to handle_alloc_space()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (23 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-27 11:54   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 26/30] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

The bdrv_co_pwrite_zeroes() call here fills complete clusters with
zeroes, but it can happen that some subclusters are not part of the
write request or the copy-on-write. This patch makes sure that only
the affected subclusters are overwritten.

A potential improvement would be to also fill with zeroes the other
subclusters if we can guarantee that we are not overwriting existing
data. However this would waste more disk space, so we should first
evaluate if it's really worth doing.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 88daaf11a0..ad230ed1b1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2349,6 +2349,9 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 
     for (m = l2meta; m != NULL; m = m->next) {
         int ret;
+        uint64_t start_offset = m->alloc_offset + m->cow_start.offset;
+        unsigned nb_bytes = m->cow_end.offset + m->cow_end.nb_bytes -
+            m->cow_start.offset;
 
         if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
             continue;
@@ -2363,16 +2366,14 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
          * efficiently zero out the whole clusters
          */
 
-        ret = qcow2_pre_write_overlap_check(bs, 0, m->alloc_offset,
-                                            m->nb_clusters * s->cluster_size,
+        ret = qcow2_pre_write_overlap_check(bs, 0, start_offset, nb_bytes,
                                             true);
         if (ret < 0) {
             return ret;
         }
 
         BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
-        ret = bdrv_co_pwrite_zeroes(s->data_file, m->alloc_offset,
-                                    m->nb_clusters * s->cluster_size,
+        ret = bdrv_co_pwrite_zeroes(s->data_file, start_offset, nb_bytes,
                                     BDRV_REQ_NO_FALLBACK);
         if (ret < 0) {
             if (ret != -ENOTSUP && ret != -EAGAIN) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 26/30] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (24 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 25/30] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-27 11:59   ` Vladimir Sementsov-Ogievskiy
  2020-03-17 18:16 ` [PATCH v4 27/30] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters Alberto Garcia
                   ` (4 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Ideally it should be possible to zero individual subclusters using
this function, but this is currently not implemented.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ad230ed1b1..d406ef355b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3743,7 +3743,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         bytes = s->cluster_size;
         nr = s->cluster_size;
         ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);
-        if (ret < 0 ||
+        /* TODO: allow zeroing separate subclusters, we only allow
+         * zeroing full clusters at the moment. */
+        if (ret < 0 || nr != bytes ||
             (type != QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN &&
              type != QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC &&
              type != QCOW2_SUBCLUSTER_ZERO_PLAIN &&
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 27/30] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (25 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 26/30] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-09 10:27   ` Max Reitz
  2020-03-17 18:16 ` [PATCH v4 28/30] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
                   ` (3 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

This function is only used by qcow2_expand_zero_clusters() to
downgrade a qcow2 image to a previous version. It is however not
possible to downgrade an image with extended L2 entries because older
versions of qcow2 do not have this feature.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c      | 8 +++++++-
 tests/qemu-iotests/061     | 6 ++++++
 tests/qemu-iotests/061.out | 5 +++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 1f471db98c..125d2852f6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2039,6 +2039,9 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
     int ret;
     int i, j;
 
+    /* qcow2_downgrade() is not allowed in images with subclusters */
+    assert(!has_subclusters(s));
+
     slice_size2 = s->l2_slice_size * l2_entry_size(s);
     n_slices = s->cluster_size / slice_size2;
 
@@ -2107,7 +2110,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN) {
                     if (!bs->backing) {
                         /* not backed; therefore we can simply deallocate the
-                         * cluster */
+                         * cluster. No need to call set_l2_bitmap(), this
+                         * function doesn't support images with subclusters. */
                         set_l2_entry(s, l2_slice, j, 0);
                         l2_dirty = true;
                         continue;
@@ -2178,6 +2182,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
                 } else {
                     set_l2_entry(s, l2_slice, j, offset);
                 }
+                /* No need to call set_l2_bitmap() after set_l2_entry() because
+                 * this function doesn't support images with subclusters. */
                 l2_dirty = true;
             }
 
diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
index 36b040491f..66bfd23179 100755
--- a/tests/qemu-iotests/061
+++ b/tests/qemu-iotests/061
@@ -266,6 +266,12 @@ $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
 _img_info --format-specific
 _check_test_img
 
+echo
+echo "=== Testing version downgrade with extended L2 entries ==="
+echo
+_make_test_img -o "compat=1.1,extended_l2=on" 64M
+$QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
+
 echo
 echo "=== Try changing the external data file ==="
 echo
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 8b3091a412..5d009867a2 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -498,6 +498,11 @@ Format specific information:
     corrupt: false
 No errors were found on the image.
 
+=== Testing version downgrade with extended L2 entries ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+qemu-img: Cannot downgrade an image with incompatible features 0x10 set
+
 === Try changing the external data file ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 28/30] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (26 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 27/30] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-09 14:49   ` Eric Blake
  2020-03-17 18:16 ` [PATCH v4 29/30] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
                   ` (2 subsequent siblings)
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Now that the implementation of subclusters is complete we can finally
add the necessary options to create and read images with this feature,
which we call "extended L2 entries".

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json             |   7 +++
 block/qcow2.h                    |   8 ++-
 include/block/block_int.h        |   1 +
 block/qcow2.c                    |  65 ++++++++++++++++++--
 tests/qemu-iotests/031.out       |   8 +--
 tests/qemu-iotests/036.out       |   4 +-
 tests/qemu-iotests/049.out       | 102 +++++++++++++++----------------
 tests/qemu-iotests/060.out       |   1 +
 tests/qemu-iotests/061.out       |  20 +++---
 tests/qemu-iotests/065           |  18 ++++--
 tests/qemu-iotests/082.out       |  48 ++++++++++++---
 tests/qemu-iotests/085.out       |  38 ++++++------
 tests/qemu-iotests/144.out       |   4 +-
 tests/qemu-iotests/182.out       |   2 +-
 tests/qemu-iotests/185.out       |   8 +--
 tests/qemu-iotests/198.out       |   2 +
 tests/qemu-iotests/206.out       |   4 ++
 tests/qemu-iotests/242.out       |   5 ++
 tests/qemu-iotests/255.out       |   8 +--
 tests/qemu-iotests/280.out       |   2 +-
 tests/qemu-iotests/common.filter |   1 +
 21 files changed, 240 insertions(+), 116 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 91586fb1fb..6161d6c03a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -66,6 +66,9 @@
 #                 standalone (read-only) raw image without looking at qcow2
 #                 metadata (since: 4.0)
 #
+# @extended-l2: true if the image has extended L2 entries; only valid for
+#               compat >= 1.1 (since 5.0)
+#
 # @lazy-refcounts: on or off; only valid for compat >= 1.1
 #
 # @corrupt: true if the image has been marked corrupt; only valid for
@@ -85,6 +88,7 @@
       'compat': 'str',
       '*data-file': 'str',
       '*data-file-raw': 'bool',
+      '*extended-l2': 'bool',
       '*lazy-refcounts': 'bool',
       '*corrupt': 'bool',
       'refcount-bits': 'int',
@@ -4270,6 +4274,8 @@
 # @data-file-raw: True if the external data file must stay valid as a
 #                 standalone (read-only) raw image without looking at qcow2
 #                 metadata (default: false; since: 4.0)
+# @extended-l2      True to make the image have extended L2 entries
+#                   (default: false; since 5.0)
 # @size: Size of the virtual disk in bytes
 # @version: Compatibility level (default: v3)
 # @backing-file: File name of the backing file if a backing file
@@ -4288,6 +4294,7 @@
   'data': { 'file':             'BlockdevRef',
             '*data-file':       'BlockdevRef',
             '*data-file-raw':   'bool',
+            '*extended-l2':     'bool',
             'size':             'size',
             '*version':         'BlockdevQcow2Version',
             '*backing-file':    'str',
diff --git a/block/qcow2.h b/block/qcow2.h
index 031ce823b3..a5506d66d0 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -231,13 +231,16 @@ enum {
     QCOW2_INCOMPAT_DIRTY_BITNR      = 0,
     QCOW2_INCOMPAT_CORRUPT_BITNR    = 1,
     QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+    QCOW2_INCOMPAT_EXTL2_BITNR      = 4,
     QCOW2_INCOMPAT_DIRTY            = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
     QCOW2_INCOMPAT_CORRUPT          = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
     QCOW2_INCOMPAT_DATA_FILE        = 1 << QCOW2_INCOMPAT_DATA_FILE_BITNR,
+    QCOW2_INCOMPAT_EXTL2            = 1 << QCOW2_INCOMPAT_EXTL2_BITNR,
 
     QCOW2_INCOMPAT_MASK             = QCOW2_INCOMPAT_DIRTY
                                     | QCOW2_INCOMPAT_CORRUPT
-                                    | QCOW2_INCOMPAT_DATA_FILE,
+                                    | QCOW2_INCOMPAT_DATA_FILE
+                                    | QCOW2_INCOMPAT_EXTL2,
 };
 
 /* Compatible feature bits */
@@ -552,8 +555,7 @@ typedef enum QCow2MetadataOverlap {
 
 static inline bool has_subclusters(BDRVQcow2State *s)
 {
-    /* FIXME: Return false until this feature is complete */
-    return false;
+    return s->incompatible_features & QCOW2_INCOMPAT_EXTL2;
 }
 
 static inline size_t l2_entry_size(BDRVQcow2State *s)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ae9c4da4d0..5c2d02de22 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,6 +57,7 @@
 #define BLOCK_OPT_REFCOUNT_BITS     "refcount_bits"
 #define BLOCK_OPT_DATA_FILE         "data_file"
 #define BLOCK_OPT_DATA_FILE_RAW     "data_file_raw"
+#define BLOCK_OPT_EXTL2             "extended_l2"
 
 #define BLOCK_PROBE_BUF_SIZE        512
 
diff --git a/block/qcow2.c b/block/qcow2.c
index d406ef355b..77b2713533 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1385,6 +1385,12 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
     s->subcluster_size = s->cluster_size / s->subclusters_per_cluster;
     s->subcluster_bits = ctz32(s->subcluster_size);
 
+    if (s->subcluster_size < (1 << MIN_CLUSTER_BITS)) {
+        error_setg(errp, "Unsupported subcluster size: %d", s->subcluster_size);
+        ret = -EINVAL;
+        goto fail;
+    }
+
     /* Check support for various header values */
     if (header.refcount_order > 6) {
         error_setg(errp, "Reference count entry width too large; may not "
@@ -2843,6 +2849,11 @@ int qcow2_update_header(BlockDriverState *bs)
                 .bit  = QCOW2_INCOMPAT_DATA_FILE_BITNR,
                 .name = "external data file",
             },
+            {
+                .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+                .bit  = QCOW2_INCOMPAT_EXTL2_BITNR,
+                .name = "extended L2 entries",
+            },
             {
                 .type = QCOW2_FEAT_TYPE_COMPATIBLE,
                 .bit  = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
@@ -3176,7 +3187,8 @@ static int64_t qcow2_calc_prealloc_size(int64_t total_size,
     return meta_size + aligned_total_size;
 }
 
-static bool validate_cluster_size(size_t cluster_size, Error **errp)
+static bool validate_cluster_size(size_t cluster_size, bool extended_l2,
+                                  Error **errp)
 {
     int cluster_bits = ctz32(cluster_size);
     if (cluster_bits < MIN_CLUSTER_BITS || cluster_bits > MAX_CLUSTER_BITS ||
@@ -3186,16 +3198,28 @@ static bool validate_cluster_size(size_t cluster_size, Error **errp)
                    "%dk", 1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
         return false;
     }
+
+    if (extended_l2) {
+        unsigned min_cluster_size =
+            (1 << MIN_CLUSTER_BITS) * QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER;
+        if (cluster_size < min_cluster_size) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "cluster sizes of at least %u bytes", min_cluster_size);
+            return false;
+        }
+    }
+
     return true;
 }
 
-static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, Error **errp)
+static size_t qcow2_opt_get_cluster_size_del(QemuOpts *opts, bool extended_l2,
+                                             Error **errp)
 {
     size_t cluster_size;
 
     cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
                                          DEFAULT_CLUSTER_SIZE);
-    if (!validate_cluster_size(cluster_size, errp)) {
+    if (!validate_cluster_size(cluster_size, extended_l2, errp)) {
         return 0;
     }
     return cluster_size;
@@ -3309,7 +3333,20 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
         cluster_size = DEFAULT_CLUSTER_SIZE;
     }
 
-    if (!validate_cluster_size(cluster_size, errp)) {
+    if (!qcow2_opts->has_extended_l2) {
+        qcow2_opts->extended_l2 = false;
+    }
+    if (qcow2_opts->extended_l2) {
+        if (version < 3) {
+            error_setg(errp, "Extended L2 entries are only supported with "
+                       "compatibility level 1.1 and above (use version=v3 or "
+                       "greater)");
+            ret = -EINVAL;
+            goto out;
+        }
+    }
+
+    if (!validate_cluster_size(cluster_size, qcow2_opts->extended_l2, errp)) {
         ret = -EINVAL;
         goto out;
     }
@@ -3429,6 +3466,11 @@ qcow2_co_create(BlockdevCreateOptions *create_options, Error **errp)
             cpu_to_be64(QCOW2_AUTOCLEAR_DATA_FILE_RAW);
     }
 
+    if (qcow2_opts->extended_l2) {
+        header->incompatible_features |=
+            cpu_to_be64(QCOW2_INCOMPAT_EXTL2);
+    }
+
     ret = blk_pwrite(blk, 0, header, cluster_size, 0);
     g_free(header);
     if (ret < 0) {
@@ -3607,6 +3649,7 @@ static int coroutine_fn qcow2_co_create_opts(const char *filename, QemuOpts *opt
         { BLOCK_OPT_BACKING_FMT,        "backing-fmt" },
         { BLOCK_OPT_CLUSTER_SIZE,       "cluster-size" },
         { BLOCK_OPT_LAZY_REFCOUNTS,     "lazy-refcounts" },
+        { BLOCK_OPT_EXTL2,              "extended-l2" },
         { BLOCK_OPT_REFCOUNT_BITS,      "refcount-bits" },
         { BLOCK_OPT_ENCRYPT,            BLOCK_OPT_ENCRYPT_FORMAT },
         { BLOCK_OPT_COMPAT_LEVEL,       "version" },
@@ -4636,9 +4679,13 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     PreallocMode prealloc;
     bool has_backing_file;
     bool has_luks;
+    bool extended_l2;
 
     /* Parse image creation options */
-    cluster_size = qcow2_opt_get_cluster_size_del(opts, &local_err);
+    extended_l2 = qemu_opt_get_bool_del(opts, BLOCK_OPT_EXTL2, false);
+
+    cluster_size = qcow2_opt_get_cluster_size_del(opts, extended_l2,
+                                                  &local_err);
     if (local_err) {
         goto err;
     }
@@ -4832,6 +4879,8 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
             .corrupt            = s->incompatible_features &
                                   QCOW2_INCOMPAT_CORRUPT,
             .has_corrupt        = true,
+            .has_extended_l2    = true,
+            .extended_l2        = has_subclusters(s),
             .refcount_bits      = s->refcount_bits,
             .has_bitmaps        = !!bitmaps,
             .bitmaps            = bitmaps,
@@ -5490,6 +5539,12 @@ static QemuOptsList qcow2_create_opts = {
             .help = "Postpone refcount updates",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_EXTL2,
+            .type = QEMU_OPT_BOOL,
+            .help = "Extended L2 tables",
+            .def_value_str = "off"
+        },
         {
             .name = BLOCK_OPT_REFCOUNT_BITS,
             .type = QEMU_OPT_NUMBER,
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index d535e407bc..d9cfdad79b 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -117,7 +117,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -150,7 +150,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
@@ -164,7 +164,7 @@ No errors were found on the image.
 
 magic                     0x514649fb
 version                   3
-backing_file_offset       0x178
+backing_file_offset       0x1a8
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -188,7 +188,7 @@ data                      'host_device'
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 Header extension:
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index 0b52b934e1..fb509f6357 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -26,7 +26,7 @@ compatible_features       []
 autoclear_features        [63]
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 
@@ -38,7 +38,7 @@ compatible_features       []
 autoclear_features        []
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 *** done
diff --git a/tests/qemu-iotests/049.out b/tests/qemu-iotests/049.out
index affa55b341..191637dfaf 100644
--- a/tests/qemu-iotests/049.out
+++ b/tests/qemu-iotests/049.out
@@ -4,90 +4,90 @@ QA output created by 049
 == 1. Traditional size parameter ==
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1024.0b
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5k
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5K
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5G
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 TEST_DIR/t.qcow2 1.5T
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 2. Specifying size via -o ==
 
 qemu-img create -f qcow2 -o size=1024 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1048576 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1099511627776 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0 TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1024.0b TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1024 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5k TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5K TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1536 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5M TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1572864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5G TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1610612736 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o size=1.5T TEST_DIR/t.qcow2
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=1649267441664 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == 3. Invalid sizes ==
 
@@ -129,84 +129,84 @@ qemu-img: TEST_DIR/t.qcow2: The image size must be specified only once
 == Check correct interpretation of suffixes for cluster size ==
 
 qemu-img create -f qcow2 -o cluster_size=1024 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1048576 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=1024.0b TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=1024 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5k TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5K TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=512 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o cluster_size=0.5M TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=524288 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check compat level option ==
 
 qemu-img create -f qcow2 -o compat=0.10 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.42 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '0.42'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.42 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=foobar TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter 'foobar'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=foobar cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check preallocation option ==
 
 qemu-img create -f qcow2 -o preallocation=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=off lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=metadata TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=metadata lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o preallocation=1234 TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Invalid parameter '1234'
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 cluster_size=65536 preallocation=1234 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check encryption option ==
 
 qemu-img create -f qcow2 -o encryption=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 --object secret,id=sec0,data=123456 -o encryption=on,encrypt.key-secret=sec0 TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 encryption=on encrypt.key-secret=sec0 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 == Check lazy_refcounts option (only with v3) ==
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=1.1 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=off TEST_DIR/t.qcow2 64M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 qemu-img create -f qcow2 -o compat=0.10,lazy_refcounts=on TEST_DIR/t.qcow2 64M
 qemu-img: TEST_DIR/t.qcow2: Lazy refcounts only supported with compatibility level 1.1 and above (use version=v3 or greater)
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 compat=0.10 cluster_size=65536 lazy_refcounts=on extended_l2=off refcount_bits=16
 
 *** done
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index d27692a33c..8c56e5f062 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -20,6 +20,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: true
+    extended l2: false
 qemu-io: can't open device TEST_DIR/t.IMGFMT: IMGFMT: Image is corrupt; cannot be opened read/write
 no file open, try 'help open'
 read 512/512 bytes at offset 0
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 5d009867a2..4390458bf5 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -26,7 +26,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -84,7 +84,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -140,7 +140,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -195,7 +195,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 magic                     0x514649fb
@@ -264,7 +264,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 65536/65536 bytes at offset 44040192
@@ -298,7 +298,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 ERROR cluster 5 refcount=0 reference=1
@@ -327,7 +327,7 @@ header_length             104
 
 Header extension:
 magic                     0x6803f857
-length                    192
+length                    240
 data                      <binary>
 
 read 131072/131072 bytes at offset 0
@@ -496,6 +496,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 
 === Testing version downgrade with extended L2 entries ===
@@ -521,6 +522,7 @@ Format specific information:
     data file: foo
     data file raw: false
     corrupt: false
+    extended l2: false
 
 qemu-img: Could not open 'TEST_DIR/t.IMGFMT': 'data-file' is required for this image
 image: TEST_DIR/t.IMGFMT
@@ -533,6 +535,7 @@ Format specific information:
     refcount bits: 16
     data file raw: false
     corrupt: false
+    extended l2: false
 
 === Clearing and setting data-file-raw ===
 
@@ -548,6 +551,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: true
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
@@ -560,6 +564,7 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 qemu-img: data-file-raw cannot be set on existing images
 image: TEST_DIR/t.IMGFMT
@@ -573,5 +578,6 @@ Format specific information:
     data file: TEST_DIR/t.IMGFMT.data
     data file raw: false
     corrupt: false
+    extended l2: false
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/065 b/tests/qemu-iotests/065
index 6426474271..386add05ae 100755
--- a/tests/qemu-iotests/065
+++ b/tests/qemu-iotests/065
@@ -95,17 +95,21 @@ class TestQCow3NotLazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled'''
     img_options = 'compat=1.1,lazy_refcounts=off'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: false',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3Lazy(TestQemuImgInfo):
     '''Testing a qcow2 version 3 image with lazy refcounts enabled'''
     img_options = 'compat=1.1,lazy_refcounts=on'
     json_compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                     'refcount-bits': 16, 'corrupt': False }
+                     'refcount-bits': 16, 'corrupt': False,
+                     'extended-l2': False }
     human_compare = [ 'compat: 1.1', 'lazy refcounts: true',
-                      'refcount bits: 16', 'corrupt: false' ]
+                      'refcount bits: 16', 'corrupt: false',
+                      'extended l2: false' ]
 
 class TestQCow3NotLazyQMP(TestQMP):
     '''Testing a qcow2 version 3 image with lazy refcounts disabled, opening
@@ -113,7 +117,8 @@ class TestQCow3NotLazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=off'
     qemu_options = 'lazy-refcounts=on'
     compare = { 'compat': '1.1', 'lazy-refcounts': False,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 
 class TestQCow3LazyQMP(TestQMP):
@@ -122,7 +127,8 @@ class TestQCow3LazyQMP(TestQMP):
     img_options = 'compat=1.1,lazy_refcounts=on'
     qemu_options = 'lazy-refcounts=off'
     compare = { 'compat': '1.1', 'lazy-refcounts': True,
-                'refcount-bits': 16, 'corrupt': False }
+                'refcount-bits': 16, 'corrupt': False,
+                'extended-l2': False }
 
 TestImageInfoSpecific = None
 TestQemuImgInfo = None
diff --git a/tests/qemu-iotests/082.out b/tests/qemu-iotests/082.out
index 9d4ed4dc9d..2a01e8bac2 100644
--- a/tests/qemu-iotests/082.out
+++ b/tests/qemu-iotests/082.out
@@ -3,14 +3,14 @@ QA output created by 082
 === create: Options specified more than once ===
 
 Testing: create -f foo -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
 cluster_size: 65536
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=4096 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -20,9 +20,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=on extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -32,9 +33,10 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: create -f qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=8192 lazy_refcounts=off extended_l2=off refcount_bits=16
 image: TEST_DIR/t.IMGFMT
 file format: IMGFMT
 virtual size: 128 MiB (134217728 bytes)
@@ -59,6 +61,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -82,6 +85,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -105,6 +109,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -128,6 +133,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -151,6 +157,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -174,6 +181,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -197,6 +205,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -220,6 +229,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -227,10 +237,10 @@ Supported options:
   size=<size>            - Virtual disk size
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,help TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,help cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -u -o backing_file=TEST_DIR/t.qcow2,,? TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2,,? cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: create -f qcow2 -o backing_file=TEST_DIR/t.qcow2, -o help TEST_DIR/t.qcow2 128M
 qemu-img: Invalid option list: backing_file=TEST_DIR/t.qcow2,
@@ -258,6 +268,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -279,7 +290,7 @@ qemu-img: Format driver 'bochs' does not support image creation
 === convert: Options specified more than once ===
 
 Testing: create -f qcow2 TEST_DIR/t.qcow2 128M
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 Testing: convert -f foo -f qcow2 TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -302,6 +313,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k -o lazy_refcounts=on -o cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -313,6 +325,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: convert -O qcow2 -o cluster_size=4k,cluster_size=8k TEST_DIR/t.qcow2 TEST_DIR/t.qcow2.base
 image: TEST_DIR/t.IMGFMT.base
@@ -339,6 +352,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -362,6 +376,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -385,6 +400,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -408,6 +424,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -431,6 +448,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -454,6 +472,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -477,6 +496,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -500,6 +520,7 @@ Supported options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   nocow=<bool (on/off)>  - Turn off copy-on-write (valid only on btrfs)
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
@@ -538,6 +559,7 @@ Supported qcow2 options:
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -582,6 +604,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=130M -o lazy_refcounts=off TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -593,6 +616,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=8M -o lazy_refcounts=on -o size=132M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -604,6 +628,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Testing: amend -f qcow2 -o size=4M,size=148M TEST_DIR/t.qcow2
 image: TEST_DIR/t.IMGFMT
@@ -630,6 +655,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -654,6 +680,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -678,6 +705,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -702,6 +730,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -726,6 +755,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -750,6 +780,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -774,6 +805,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -798,6 +830,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
@@ -839,6 +872,7 @@ Creation options for 'qcow2':
   encrypt.ivgen-hash-alg=<str> - Name of IV generator hash algorithm
   encrypt.key-secret=<str> - ID of secret providing qcow AES key or LUKS passphrase
   encryption=<bool (on/off)> - Encrypt the image with format 'aes'. (Deprecated in favor of encrypt.format=aes)
+  extended_l2=<bool (on/off)> - Extended L2 tables
   lazy_refcounts=<bool (on/off)> - Postpone refcount updates
   preallocation=<str>    - Preallocation mode (allowed values: off, metadata, falloc, full)
   refcount_bits=<num>    - Width of a reference count entry in bits
diff --git a/tests/qemu-iotests/085.out b/tests/qemu-iotests/085.out
index fd11aae678..0142b2265f 100644
--- a/tests/qemu-iotests/085.out
+++ b/tests/qemu-iotests/085.out
@@ -13,7 +13,7 @@ Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=134217728
 === Create a single snapshot on virtio0 ===
 
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'snapshot-file':'TEST_DIR/1-snapshot-v0.IMGFMT', 'format': 'IMGFMT' } }
-Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.1 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Invalid command - missing device and nodename ===
@@ -30,40 +30,40 @@ Formatting 'TEST_DIR/1-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file
 === Create several transactional group snapshots ===
 
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/2-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/2-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/1-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/2-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/t.qcow2.2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/3-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/3-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/3-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/2-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/4-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/4-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/4-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/3-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/5-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/5-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/5-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/4-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/6-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/6-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/6-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/5-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/7-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/7-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/7-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/6-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/8-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/8-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/8-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/7-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/9-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/9-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/9-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/8-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'transaction', 'arguments': {'actions': [ { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio0', 'snapshot-file': 'TEST_DIR/10-snapshot-v0.IMGFMT' } }, { 'type': 'blockdev-snapshot-sync', 'data' : { 'device': 'virtio1', 'snapshot-file': 'TEST_DIR/10-snapshot-v1.IMGFMT' } } ] } }
-Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
-Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v0.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v0.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
+Formatting 'TEST_DIR/10-snapshot-v1.qcow2', fmt=qcow2 size=134217728 backing_file=TEST_DIR/9-snapshot-v1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Create a couple of snapshots using blockdev-snapshot ===
diff --git a/tests/qemu-iotests/144.out b/tests/qemu-iotests/144.out
index c7aa2e4820..5d9aceaf13 100644
--- a/tests/qemu-iotests/144.out
+++ b/tests/qemu-iotests/144.out
@@ -9,7 +9,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'snapshot-file':'TEST_DIR/tmp.IMGFMT', 'format': 'IMGFMT' } }
-Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Performing block-commit on active layer ===
@@ -31,6 +31,6 @@ Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/
 === Performing Live Snapshot 2 ===
 
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'virtio0', 'snapshot-file':'TEST_DIR/tmp2.IMGFMT', 'format': 'IMGFMT' } }
-Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 *** done
diff --git a/tests/qemu-iotests/182.out b/tests/qemu-iotests/182.out
index a8eea166c3..84dc7a2360 100644
--- a/tests/qemu-iotests/182.out
+++ b/tests/qemu-iotests/182.out
@@ -13,7 +13,7 @@ Is another process using the image [TEST_DIR/t.qcow2]?
 {'execute': 'blockdev-add', 'arguments': { 'node-name': 'node0', 'driver': 'file', 'filename': 'TEST_DIR/t.IMGFMT', 'locking': 'on' } }
 {"return": {}}
 {'execute': 'blockdev-snapshot-sync', 'arguments': { 'node-name': 'node0', 'snapshot-file': 'TEST_DIR/t.IMGFMT.overlay', 'snapshot-node-name': 'node1' } }
-Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.overlay', fmt=qcow2 size=197120 backing_file=TEST_DIR/t.qcow2 backing_fmt=file cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 {'execute': 'blockdev-add', 'arguments': { 'node-name': 'node1', 'driver': 'file', 'filename': 'TEST_DIR/t.IMGFMT', 'locking': 'on' } }
 {"return": {}}
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index 9a3b65782b..859fa7daaa 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -9,14 +9,14 @@ Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
 === Creating backing chain ===
 
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'disk', 'snapshot-file': 'TEST_DIR/t.IMGFMT.mid', 'format': 'IMGFMT', 'mode': 'absolute-paths' } }
-Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 { 'execute': 'human-monitor-command', 'arguments': { 'command-line': 'qemu-io disk "write 0 4M"' } }
 wrote 4194304/4194304 bytes at offset 0
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 {"return": ""}
 { 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'disk', 'snapshot-file': 'TEST_DIR/t.IMGFMT', 'format': 'IMGFMT', 'mode': 'absolute-paths' } }
-Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"return": {}}
 
 === Start commit job and exit qemu ===
@@ -48,7 +48,7 @@ Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.q
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 { 'execute': 'drive-mirror', 'arguments': { 'device': 'disk', 'target': 'TEST_DIR/t.IMGFMT.copy', 'format': 'IMGFMT', 'sync': 'full', 'speed': 65536 } }
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"return": {}}
@@ -62,7 +62,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 l
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
 { 'execute': 'drive-backup', 'arguments': { 'device': 'disk', 'target': 'TEST_DIR/t.IMGFMT.copy', 'format': 'IMGFMT', 'sync': 'full', 'speed': 65536 } }
-Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "disk"}}
diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out
index 831ce3a289..f1e8cf7bab 100644
--- a/tests/qemu-iotests/198.out
+++ b/tests/qemu-iotests/198.out
@@ -72,6 +72,7 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 
 == checking image layer ==
 image: json:{ /* filtered */ }
@@ -115,4 +116,5 @@ Format specific information:
                 key offset: 1810432
         payload offset: 2068480
         master key iters: 1024
+    extended l2: false
 *** done
diff --git a/tests/qemu-iotests/206.out b/tests/qemu-iotests/206.out
index 61e7241e0b..d2efc0394a 100644
--- a/tests/qemu-iotests/206.out
+++ b/tests/qemu-iotests/206.out
@@ -21,6 +21,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (inline blockdev-add, explicit defaults) ===
 
@@ -43,6 +44,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v3 non-default options) ===
 
@@ -65,6 +67,7 @@ Format specific information:
     lazy refcounts: true
     refcount bits: 1
     corrupt: false
+    extended l2: false
 
 === Successful image creation (v2 non-default options) ===
 
@@ -141,6 +144,7 @@ Format specific information:
         payload offset: 528384
         master key iters: XXX
     corrupt: false
+    extended l2: false
 
 === Invalid BlockdevRef ===
 
diff --git a/tests/qemu-iotests/242.out b/tests/qemu-iotests/242.out
index 7ac8404d11..0d32dd9148 100644
--- a/tests/qemu-iotests/242.out
+++ b/tests/qemu-iotests/242.out
@@ -15,6 +15,7 @@ Format specific information:
     lazy refcounts: false
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 No bitmap in JSON format output
 
@@ -40,6 +41,7 @@ Format specific information:
             granularity: 32768
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -77,6 +79,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -119,6 +122,7 @@ Format specific information:
             granularity: 65536
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 The same bitmaps in JSON format:
 [
@@ -162,5 +166,6 @@ Format specific information:
             granularity: 16384
     refcount bits: 16
     corrupt: false
+    extended l2: false
 
 Test complete
diff --git a/tests/qemu-iotests/255.out b/tests/qemu-iotests/255.out
index 348909fdef..4e1b917a0f 100644
--- a/tests/qemu-iotests/255.out
+++ b/tests/qemu-iotests/255.out
@@ -3,9 +3,9 @@ Finishing a commit job with background reads
 
 === Create backing chain and start VM ===
 
-Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2.mid', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-t.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 === Start background read requests ===
 
@@ -23,9 +23,9 @@ Closing the VM while a job is being cancelled
 
 === Create images and start VM ===
 
-Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-src.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
-Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-dst.qcow2', fmt=qcow2 size=134217728 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 wrote 1048576/1048576 bytes at offset 0
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/280.out b/tests/qemu-iotests/280.out
index 5d382faaa8..ef1aad1ae1 100644
--- a/tests/qemu-iotests/280.out
+++ b/tests/qemu-iotests/280.out
@@ -1,4 +1,4 @@
-Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off refcount_bits=16
+Formatting 'TEST_DIR/PID-base', fmt=qcow2 size=67108864 cluster_size=65536 lazy_refcounts=off extended_l2=off refcount_bits=16
 
 === Launch VM ===
 Enabling migration QMP events on VM...
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 3f8ee3e5f7..db9793e7d9 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -146,6 +146,7 @@ _filter_img_create()
         -e "s# adapter_type=[^ ]*##g" \
         -e "s# hwversion=[^ ]*##g" \
         -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+        -e "s# extended_l2=\\(on\\|off\\)##g" \
         -e "s# block_size=[0-9]\\+##g" \
         -e "s# block_state_zero=\\(on\\|off\\)##g" \
         -e "s# log_size=[0-9]\\+##g" \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 29/30] qcow2: Add subcluster support to qcow2_measure()
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (27 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 28/30] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-03-17 18:16 ` [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
  2020-04-21  5:06 ` [PATCH v4 00/30] Add subcluster allocation to qcow2 Derek Su
  30 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Extended L2 entries are bigger than normal L2 entries so this has an
impact on the amount of metadata needed for a qcow2 file.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 77b2713533..aefac85b23 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3154,28 +3154,31 @@ int64_t qcow2_refcount_metadata_size(int64_t clusters, size_t cluster_size,
  * @total_size: virtual disk size in bytes
  * @cluster_size: cluster size in bytes
  * @refcount_order: refcount bits power-of-2 exponent
+ * @extended_l2: true if the image has extended L2 entries
  *
  * Returns: Total number of bytes required for the fully allocated image
  * (including metadata).
  */
 static int64_t qcow2_calc_prealloc_size(int64_t total_size,
                                         size_t cluster_size,
-                                        int refcount_order)
+                                        int refcount_order,
+                                        bool extended_l2)
 {
     int64_t meta_size = 0;
     uint64_t nl1e, nl2e;
     int64_t aligned_total_size = ROUND_UP(total_size, cluster_size);
+    size_t l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
 
     /* header: 1 cluster */
     meta_size += cluster_size;
 
     /* total size of L2 tables */
     nl2e = aligned_total_size / cluster_size;
-    nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
-    meta_size += nl2e * sizeof(uint64_t);
+    nl2e = ROUND_UP(nl2e, cluster_size / l2e_size);
+    meta_size += nl2e * l2e_size;
 
     /* total size of L1 tables */
-    nl1e = nl2e * sizeof(uint64_t) / cluster_size;
+    nl1e = nl2e * l2e_size / cluster_size;
     nl1e = ROUND_UP(nl1e, cluster_size / sizeof(uint64_t));
     meta_size += nl1e * sizeof(uint64_t);
 
@@ -4680,6 +4683,7 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     bool has_backing_file;
     bool has_luks;
     bool extended_l2;
+    size_t l2e_size;
 
     /* Parse image creation options */
     extended_l2 = qemu_opt_get_bool_del(opts, BLOCK_OPT_EXTL2, false);
@@ -4748,8 +4752,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     virtual_size = ROUND_UP(virtual_size, cluster_size);
 
     /* Check that virtual disk size is valid */
+    l2e_size = extended_l2 ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
     l2_tables = DIV_ROUND_UP(virtual_size / cluster_size,
-                             cluster_size / sizeof(uint64_t));
+                             cluster_size / l2e_size);
     if (l2_tables * sizeof(uint64_t) > QCOW_MAX_L1_SIZE) {
         error_setg(&local_err, "The image size is too large "
                                "(try using a larger cluster size)");
@@ -4812,9 +4817,9 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     }
 
     info = g_new(BlockMeasureInfo, 1);
-    info->fully_allocated =
+    info->fully_allocated = luks_payload_size +
         qcow2_calc_prealloc_size(virtual_size, cluster_size,
-                                 ctz32(refcount_bits)) + luks_payload_size;
+                                 ctz32(refcount_bits), extended_l2);
 
     /* Remove data clusters that are not required.  This overestimates the
      * required size because metadata needed for the fully allocated file is
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (28 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 29/30] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
@ 2020-03-17 18:16 ` Alberto Garcia
  2020-04-09 12:22   ` Max Reitz
  2020-04-21  5:06 ` [PATCH v4 00/30] Add subcluster allocation to qcow2 Derek Su
  30 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-03-17 18:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, Alberto Garcia, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 tests/qemu-iotests/271     | 359 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/271.out | 244 +++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 604 insertions(+)
 create mode 100755 tests/qemu-iotests/271
 create mode 100644 tests/qemu-iotests/271.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
new file mode 100755
index 0000000000..48f4d8d8ce
--- /dev/null
+++ b/tests/qemu-iotests/271
@@ -0,0 +1,359 @@
+#!/bin/bash
+#
+# Test qcow2 images with extended L2 entries
+#
+# Copyright (C) 2019-2020 Igalia, S.L.
+# Author: Alberto Garcia <berto@igalia.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=berto@igalia.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+        rm -f "$TEST_IMG.raw"
+        rm -f "$TEST_IMG.backing"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file nfs
+_supported_os Linux
+
+IMGOPTS="extended_l2=on"
+l2_offset=262144 # 0x40000
+
+_verify_img()
+{
+    $QEMU_IMG compare "$TEST_IMG" "$TEST_IMG.raw" | grep -v 'Images are identical'
+    $QEMU_IMG check "$TEST_IMG" | _filter_qemu_img_check | \
+        grep -v 'No errors were found on the image'
+}
+
+# Compare the bitmap of an extended L2 entry against an expected value
+_verify_l2_bitmap()
+{
+    entry_no="$1"        # L2 entry number, starting from 0
+    expected_alloc="$2"  # Space-separated list of allocated subcluster indexes
+    expected_zero="$3"   # Space-separated list of zero subcluster indexes
+
+    offset=$(($l2_offset + $entry_no * 16))
+    entry=`peek_file_be "$TEST_IMG" $offset 8`
+    offset=$(($offset + 8))
+    bitmap=`peek_file_be "$TEST_IMG" $offset 8`
+
+    expected_bitmap=0
+    for bit in $expected_alloc; do
+        expected_bitmap=$(($expected_bitmap | (1 << $bit)))
+    done
+    for bit in $expected_zero; do
+        expected_bitmap=$(($expected_bitmap | (1 << (32 + $bit))))
+    done
+    expected_bitmap=`printf "%llu" $expected_bitmap`
+
+    printf "L2 entry #%d: 0x%016lx %016lx\n" "$entry_no" "$entry" "$bitmap"
+    if [ "$bitmap" != "$expected_bitmap" ]; then
+        printf "ERROR: expecting bitmap       0x%016lx\n" "$expected_bitmap"
+    fi
+}
+
+_test_write()
+{
+    cmd="$1"
+    alloc_bitmap="$2"
+    zero_bitmap="$3"
+    l2_entry_idx="$4"
+    [ -n "$l2_entry_idx" ] || l2_entry_idx=0
+    raw_cmd=`echo $cmd | sed s/-c//` # Raw images don't support -c
+    echo "$cmd"
+    $QEMU_IO -c "$cmd" "$TEST_IMG" | _filter_qemu_io
+    $QEMU_IO -c "$raw_cmd" -f raw "$TEST_IMG.raw" | _filter_qemu_io
+    _verify_img
+    _verify_l2_bitmap "$l2_entry_idx" "$alloc_bitmap" "$zero_bitmap"
+}
+
+_reset_img()
+{
+    $QEMU_IMG create -f raw "$TEST_IMG.raw" 1M | _filter_img_create
+    if [ "$use_backing_file" = "yes" ]; then
+        $QEMU_IMG create -f raw "$TEST_IMG.backing" 1M | _filter_img_create
+        $QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.backing" | _filter_qemu_io
+        $QEMU_IO -c 'write -q -P 0xFF 0 1M' -f raw "$TEST_IMG.raw" | _filter_qemu_io
+        _make_test_img -b "$TEST_IMG.backing" 1M
+    else
+        _make_test_img 1M
+    fi
+}
+
+# Test that writing to an image with subclusters produces the expected
+# results, in images with and without backing files
+for use_backing_file in yes no; do
+    echo
+    echo "### Standard write tests (backing file: $use_backing_file) ###"
+    echo
+    _reset_img
+    ### Write subcluster #0 (beginning of subcluster) ###
+    alloc="0"; zero=""
+    _test_write 'write -q -P 1 0 1k' "$alloc" "$zero"
+
+    ### Write subcluster #1 (middle of subcluster) ###
+    alloc="0 1"; zero=""
+    _test_write 'write -q -P 2 3k 512' "$alloc" "$zero"
+
+    ### Write subcluster #2 (end of subcluster) ###
+    alloc="0 1 2"; zero=""
+    _test_write 'write -q -P 3 5k 1k' "$alloc" "$zero"
+
+    ### Write subcluster #3 (full subcluster) ###
+    alloc="0 1 2 3"; zero=""
+    _test_write 'write -q -P 4 6k 2k' "$alloc" "$zero"
+
+    ### Write subclusters #4-6 (full subclusters) ###
+    alloc="`seq 0 6`"; zero=""
+    _test_write 'write -q -P 5 8k 6k' "$alloc" "$zero"
+
+    ### Write subclusters #7-9 (partial subclusters) ###
+    alloc="`seq 0 9`"; zero=""
+    _test_write 'write -q -P 6 15k 4k' "$alloc" "$zero"
+
+    ### Write subcluster #16 (partial subcluster) ###
+    alloc="`seq 0 9` 16"; zero=""
+    _test_write 'write -q -P 7 32k 1k' "$alloc" "$zero"
+
+    ### Write subcluster #31-#34 (cluster overlap) ###
+    alloc="`seq 0 9` 16 31"; zero=""
+    _test_write 'write -q -P 8 63k 4k' "$alloc" "$zero"
+    alloc="0 1" ; zero=""
+    _verify_l2_bitmap 1 "$alloc" "$zero"
+
+    ### Zero subcluster #1 (TODO: use the "all zeros" bit)
+    alloc="`seq 0 9` 16 31"; zero=""
+    _test_write 'write -q -z 2k 2k' "$alloc" "$zero"
+
+    ### Zero cluster #0
+    alloc=""; zero="`seq 0 31`"
+    _test_write 'write -q -z 0 64k' "$alloc" "$zero"
+
+    ### Fill cluster #0 with data
+    alloc="`seq 0 31`"; zero=""
+    _test_write 'write -q -P 9 0 64k' "$alloc" "$zero"
+
+    ### Zero and unmap half of cluster #0 (this won't unmap it)
+    alloc="`seq 0 31`"; zero=""
+    _test_write 'write -q -z -u 0 32k' "$alloc" "$zero"
+
+    ### Zero and unmap cluster #0
+    alloc=""; zero="`seq 0 31`"
+    _test_write 'write -q -z -u 0 64k' "$alloc" "$zero"
+
+    ### Write subcluster #1 (middle of subcluster)
+    alloc="1"; zero="0 `seq 2 31`"
+    _test_write 'write -q -P 10 3k 512' "$alloc" "$zero"
+
+    ### Fill cluster #0 with data
+    alloc="`seq 0 31`"; zero=""
+    _test_write 'write -q -P 11 0 64k' "$alloc" "$zero"
+
+    ### Discard cluster #0
+    alloc=""; zero="`seq 0 31`"
+    _test_write 'discard -q 0 64k' "$alloc" "$zero"
+
+    ### Write compressed data to cluster #0
+    alloc=""; zero=""
+    _test_write 'write -q -c -P 12 0 64k' "$alloc" "$zero"
+
+    ### Write subcluster #2 (middle of subcluster)
+    alloc="`seq 0 31`"; zero=""
+    _test_write 'write -q -P 13 3k 512' "$alloc" "$zero"
+
+    ### Zeroize an unallocated cluster (#2)
+    alloc=""; zero="`seq 0 31`"
+    _test_write 'write -q -z 128k 64k' "$alloc" "$zero" 2
+
+    ### Partially zeroize an unallocated cluster (#3)
+    if [ "$use_backing_file" = "yes" ]; then
+        alloc="`seq 0 15`"; zero=""
+    else
+        alloc=""; zero="`seq 0 31`"
+    fi
+    _test_write 'write -q -z 192k 32k' "$alloc" "$zero" 3
+done
+
+for use_backing_file in yes no; do
+    echo
+    echo "### Discarding clusters with non-zero bitmaps (backing file: $use_backing_file) ###"
+    echo
+    if [ "$use_backing_file" = "yes" ]; then
+        _make_test_img -b "$TEST_IMG.backing" 1M
+    else
+        _make_test_img 1M
+    fi
+    # Write clusters #0-#2 and then discard them
+    $QEMU_IO -c 'write -q 0 128k' "$TEST_IMG"
+    $QEMU_IO -c 'discard -q 0 128k' "$TEST_IMG"
+    # 'qemu-io discard' doesn't do a full discard, it zeroizes the
+    # cluster, so both clusters have all zero bits set now
+    alloc=""; zero="`seq 0 31`"
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    _verify_l2_bitmap 1 "$alloc" "$zero"
+    # Now deallocate half of the subclusters of the first cluster
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x00\x00"
+    # Discard cluster #0 again to see how the zero bits have changed
+    $QEMU_IO -c 'discard -q 0 64k' "$TEST_IMG"
+    # And do a full discard of cluster #1 by shrinking and growing the image
+    $QEMU_IMG resize --shrink "$TEST_IMG" 64k
+    $QEMU_IMG resize "$TEST_IMG" 1M
+    # A normal discard sets all 'zero' bits only if the image has a
+    # backing file, otherwise it won't touch them.
+    if [ "$use_backing_file" = "yes" ]; then
+        alloc=""; zero="`seq 0 31`"
+    else
+        alloc=""; zero="`seq 0 15`"
+    fi
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    # A full discard should clear the L2 entry completely
+    alloc=""; zero=""
+    _verify_l2_bitmap 1 "$alloc" "$zero"
+done
+
+# Test that corrupted L2 entries are detected in both read and write
+# operations
+for corruption_test_cmd in read write; do
+    echo
+    echo "### Corrupted L2 entries - $corruption_test_cmd test (allocated) ###"
+    echo
+    echo "# 'cluster is zero' bit set on the standard cluster descriptor"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q 0 2k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+7)) "\x01"
+    alloc="0"; zero=""
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# Both 'subcluster is zero' and 'subcluster is allocated' bits set"
+    echo
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q 0 2k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+11)) "\x01"
+    alloc="0"; zero="0"
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "### Corrupted L2 entries - $corruption_test_cmd test (unallocated) ###"
+    echo
+    echo "# 'cluster is zero' bit set on the standard cluster descriptor"
+    echo
+    _make_test_img 1M
+    # We want to corrupt the (empty) L2 entry from cluster #0,
+    # but we write to #4 in order to initialize the L2 table first
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+7)) "\x01"
+    alloc=""; zero=""
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# 'subcluster is allocated' bit set"
+    echo
+    _make_test_img 1M
+    # We want to corrupt the (empty) L2 entry from cluster #0,
+    # but we write to #4 in order to initialize the L2 table first
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+15)) "\x01"
+    alloc="0"; zero=""
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "# Both 'subcluster is zero' and 'subcluster is allocated' bits set"
+    echo
+    _make_test_img 1M
+    # We want to corrupt the (empty) L2 entry from cluster #0,
+    # but we write to #4 in order to initialize the L2 table first
+    $QEMU_IO -c 'write -q 256k 1k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+8)) "\x00\x00\x00\x01\x00\x00\x00\x01"
+    alloc="0"; zero="0"
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    $QEMU_IO -c "$corruption_test_cmd 0 1k" "$TEST_IMG"
+
+    echo
+    echo "### Compressed cluster with subcluster bitmap != 0 - $corruption_test_cmd test ###"
+    echo
+    # We actually don't consider this a corrupted image.
+    # The bitmap in compressed clusters is unused so QEMU should just ignore it.
+    _make_test_img 1M
+    $QEMU_IO -c 'write -q -P 11 -c 0 64k' "$TEST_IMG"
+    poke_file "$TEST_IMG" $(($l2_offset+11)) "\x01\x01"
+    alloc="24"; zero="0"
+    _verify_l2_bitmap 0 "$alloc" "$zero"
+    $QEMU_IO -c "$corruption_test_cmd -P 11 0 64k" "$TEST_IMG" | _filter_qemu_io
+done
+
+echo
+echo "### Image creation options ###"
+echo
+echo "# cluster_size < 16k"
+IMGOPTS="extended_l2=on,cluster_size=8k" _make_test_img 1M
+
+echo "# backing file and preallocation=metadata"
+IMGOPTS="extended_l2=on,preallocation=metadata" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo "# backing file and preallocation=falloc"
+IMGOPTS="extended_l2=on,preallocation=falloc" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo "# backing file and preallocation=full"
+IMGOPTS="extended_l2=on,preallocation=full" _make_test_img -b "$TEST_IMG.backing" 1M
+
+echo
+echo "### qemu-img measure ###"
+echo
+echo "# 512MB, extended_l2=off" # This needs one L2 table
+$QEMU_IMG measure --size 512M -O qcow2 -o extended_l2=off
+echo "# 512MB, extended_l2=on"  # This needs two L2 tables
+$QEMU_IMG measure --size 512M -O qcow2 -o extended_l2=on
+
+echo "# 16K clusters, 64GB, extended_l2=off" # This needs one L1 table
+$QEMU_IMG measure --size 64G -O qcow2 -o cluster_size=16k,extended_l2=off
+echo "# 16K clusters, 64GB, extended_l2=on"  # This needs two L2 tables
+$QEMU_IMG measure --size 64G -O qcow2 -o cluster_size=16k,extended_l2=on
+
+echo "# 8k clusters" # This should fail
+$QEMU_IMG measure --size 1M -O qcow2 -o cluster_size=8k,extended_l2=on
+
+echo "# 1024 TB" # Maximum allowed size with extended_l2=on and 64K clusters
+$QEMU_IMG measure --size 1024T -O qcow2 -o extended_l2=on
+echo "# 1025 TB" # This should fail
+$QEMU_IMG measure --size 1025T -O qcow2 -o extended_l2=on
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
+
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
new file mode 100644
index 0000000000..c36dcaafc4
--- /dev/null
+++ b/tests/qemu-iotests/271.out
@@ -0,0 +1,244 @@
+QA output created by 271
+
+### Standard write tests (backing file: yes) ###
+
+Formatting 'TEST_DIR/t.IMGFMT.raw', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT.backing', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing
+write -q -P 1 0 1k
+L2 entry #0: 0x8000000000050000 0000000000000001
+write -q -P 2 3k 512
+L2 entry #0: 0x8000000000050000 0000000000000003
+write -q -P 3 5k 1k
+L2 entry #0: 0x8000000000050000 0000000000000007
+write -q -P 4 6k 2k
+L2 entry #0: 0x8000000000050000 000000000000000f
+write -q -P 5 8k 6k
+L2 entry #0: 0x8000000000050000 000000000000007f
+write -q -P 6 15k 4k
+L2 entry #0: 0x8000000000050000 00000000000003ff
+write -q -P 7 32k 1k
+L2 entry #0: 0x8000000000050000 00000000000103ff
+write -q -P 8 63k 4k
+L2 entry #0: 0x8000000000050000 00000000800103ff
+L2 entry #1: 0x8000000000060000 0000000000000003
+write -q -z 2k 2k
+L2 entry #0: 0x8000000000050000 00000000800103ff
+write -q -z 0 64k
+L2 entry #0: 0x8000000000050000 ffffffff00000000
+write -q -P 9 0 64k
+L2 entry #0: 0x8000000000050000 00000000ffffffff
+write -q -z -u 0 32k
+L2 entry #0: 0x8000000000050000 00000000ffffffff
+write -q -z -u 0 64k
+L2 entry #0: 0x0000000000000000 ffffffff00000000
+write -q -P 10 3k 512
+L2 entry #0: 0x8000000000050000 fffffffd00000002
+write -q -P 11 0 64k
+L2 entry #0: 0x8000000000050000 00000000ffffffff
+discard -q 0 64k
+L2 entry #0: 0x0000000000000000 ffffffff00000000
+write -q -c -P 12 0 64k
+L2 entry #0: 0x4000000000050000 0000000000000000
+write -q -P 13 3k 512
+L2 entry #0: 0x8000000000070000 00000000ffffffff
+write -q -z 128k 64k
+L2 entry #2: 0x0000000000000000 ffffffff00000000
+write -q -z 192k 32k
+L2 entry #3: 0x8000000000050000 000000000000ffff
+
+### Standard write tests (backing file: no) ###
+
+Formatting 'TEST_DIR/t.IMGFMT.raw', fmt=raw size=1048576
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+write -q -P 1 0 1k
+L2 entry #0: 0x8000000000050000 0000000000000001
+write -q -P 2 3k 512
+L2 entry #0: 0x8000000000050000 0000000000000003
+write -q -P 3 5k 1k
+L2 entry #0: 0x8000000000050000 0000000000000007
+write -q -P 4 6k 2k
+L2 entry #0: 0x8000000000050000 000000000000000f
+write -q -P 5 8k 6k
+L2 entry #0: 0x8000000000050000 000000000000007f
+write -q -P 6 15k 4k
+L2 entry #0: 0x8000000000050000 00000000000003ff
+write -q -P 7 32k 1k
+L2 entry #0: 0x8000000000050000 00000000000103ff
+write -q -P 8 63k 4k
+L2 entry #0: 0x8000000000050000 00000000800103ff
+L2 entry #1: 0x8000000000060000 0000000000000003
+write -q -z 2k 2k
+L2 entry #0: 0x8000000000050000 00000000800103ff
+write -q -z 0 64k
+L2 entry #0: 0x8000000000050000 ffffffff00000000
+write -q -P 9 0 64k
+L2 entry #0: 0x8000000000050000 00000000ffffffff
+write -q -z -u 0 32k
+L2 entry #0: 0x8000000000050000 00000000ffffffff
+write -q -z -u 0 64k
+L2 entry #0: 0x0000000000000000 ffffffff00000000
+write -q -P 10 3k 512
+L2 entry #0: 0x8000000000050000 fffffffd00000002
+write -q -P 11 0 64k
+L2 entry #0: 0x8000000000050000 00000000ffffffff
+discard -q 0 64k
+L2 entry #0: 0x0000000000000000 ffffffff00000000
+write -q -c -P 12 0 64k
+L2 entry #0: 0x4000000000050000 0000000000000000
+write -q -P 13 3k 512
+L2 entry #0: 0x8000000000070000 00000000ffffffff
+write -q -z 128k 64k
+L2 entry #2: 0x0000000000000000 ffffffff00000000
+write -q -z 192k 32k
+L2 entry #3: 0x0000000000000000 ffffffff00000000
+
+### Discarding clusters with non-zero bitmaps (backing file: yes) ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing
+L2 entry #0: 0x0000000000000000 ffffffff00000000
+L2 entry #1: 0x0000000000000000 ffffffff00000000
+Image resized.
+Image resized.
+L2 entry #0: 0x0000000000000000 ffffffff00000000
+L2 entry #1: 0x0000000000000000 0000000000000000
+
+### Discarding clusters with non-zero bitmaps (backing file: no) ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x0000000000000000 ffffffff00000000
+L2 entry #1: 0x0000000000000000 ffffffff00000000
+Image resized.
+Image resized.
+L2 entry #0: 0x0000000000000000 0000ffff00000000
+L2 entry #1: 0x0000000000000000 0000000000000000
+
+### Corrupted L2 entries - read test (allocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x8000000000050001 0000000000000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x8000000000050000 0000000100000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Corrupted L2 entries - read test (unallocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x0000000000000001 0000000000000000
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# 'subcluster is allocated' bit set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x0000000000000000 0000000000000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x0000000000000000 0000000100000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+read failed: Input/output error
+
+### Compressed cluster with subcluster bitmap != 0 - read test ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x4000000000050000 0000000101000000
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+### Corrupted L2 entries - write test (allocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x8000000000050001 0000000000000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x8000000000050000 0000000100000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Corrupted L2 entries - write test (unallocated) ###
+
+# 'cluster is zero' bit set on the standard cluster descriptor
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x0000000000000001 0000000000000000
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# 'subcluster is allocated' bit set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x0000000000000000 0000000000000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+# Both 'subcluster is zero' and 'subcluster is allocated' bits set
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x0000000000000000 0000000100000001
+qcow2: Marking image as corrupt: Invalid cluster entry found  (L2 offset: 0x40000, L2 index: 0); further corruption events will be suppressed
+write failed: Input/output error
+
+### Compressed cluster with subcluster bitmap != 0 - write test ###
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+L2 entry #0: 0x4000000000050000 0000000101000000
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+### Image creation options ###
+
+# cluster_size < 16k
+qemu-img: TEST_DIR/t.IMGFMT: Extended L2 entries are only supported with cluster sizes of at least 16384 bytes
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+# backing file and preallocation=metadata
+qemu-img: TEST_DIR/t.IMGFMT: Backing file and preallocation cannot be used at the same time
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=metadata
+# backing file and preallocation=falloc
+qemu-img: TEST_DIR/t.IMGFMT: Backing file and preallocation cannot be used at the same time
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=falloc
+# backing file and preallocation=full
+qemu-img: TEST_DIR/t.IMGFMT: Backing file and preallocation cannot be used at the same time
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.backing preallocation=full
+
+### qemu-img measure ###
+
+# 512MB, extended_l2=off
+required size: 327680
+fully allocated size: 537198592
+# 512MB, extended_l2=on
+required size: 393216
+fully allocated size: 537264128
+# 16K clusters, 64GB, extended_l2=off
+required size: 42008576
+fully allocated size: 68761485312
+# 16K clusters, 64GB, extended_l2=on
+required size: 75579392
+fully allocated size: 68795056128
+# 8k clusters
+qemu-img: Extended L2 entries are only supported with cluster sizes of at least 16384 bytes
+# 1024 TB
+required size: 309285027840
+fully allocated size: 1126209191870464
+# 1025 TB
+qemu-img: The image size is too large (try using a larger cluster size)
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ec2b2302e5..d7b0e03737 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -284,6 +284,7 @@
 267 rw auto quick snapshot
 268 rw auto quick
 270 rw backing quick
+271 rw auto
 272 rw
 273 backing quick
 277 rw quick
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset
  2020-03-17 18:15 ` [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
@ 2020-03-18 11:23   ` Eric Blake
  2020-04-08 10:23   ` Max Reitz
  2020-04-09  6:49   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 128+ messages in thread
From: Eric Blake @ 2020-03-18 11:23 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 3/17/20 1:15 PM, Alberto Garcia wrote:
> The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
> host offset. In practice this is not very useful because all users(*)
> of this structure need the final host offset into the cluster, which
> they calculate using
> 
>     host_offset = file_cluster_offset + offset_into_cluster(s, offset)
> 
> There is no reason why Qcow2AioTask cannot store host_offset directly
> and that is what this patch does.
> 
> (*) compressed clusters are the exception: in this case what
>      file_cluster_offset was storing was the full compressed cluster
>      descriptor (offset + size). This does not change with this patch
>      but it is documented now.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2.c | 68 +++++++++++++++++++++++++--------------------------
>   1 file changed, 33 insertions(+), 35 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-03-17 18:15 ` [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() Alberto Garcia
@ 2020-03-18 12:08   ` Eric Blake
  2020-04-08 10:51   ` Max Reitz
  2020-04-09  7:50   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 128+ messages in thread
From: Eric Blake @ 2020-03-18 12:08 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 3/17/20 1:15 PM, Alberto Garcia wrote:
> qcow2_get_cluster_offset() takes an (unaligned) guest offset and
> returns the (aligned) offset of the corresponding cluster in the qcow2
> image.
> 
> In practice none of the callers need to know where the cluster starts
> so this patch makes the function calculate and return the final host
> offset directly. The function is also renamed accordingly.
> 
> There is a pre-existing exception with compressed clusters: in this
> case the function returns the complete cluster descriptor (containing
> the offset and size of the compressed data). This does not change with
> this patch but it is now documented.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2.h         |  4 ++--
>   block/qcow2-cluster.c | 38 ++++++++++++++++++++++----------------
>   block/qcow2.c         | 24 +++++++-----------------
>   3 files changed, 31 insertions(+), 35 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset
  2020-03-17 18:15 ` [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
  2020-03-18 11:23   ` Eric Blake
@ 2020-04-08 10:23   ` Max Reitz
  2020-04-09  6:49   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-08 10:23 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1652 bytes --]

On 17.03.20 19:15, Alberto Garcia wrote:
> The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
> host offset. In practice this is not very useful because all users(*)
> of this structure need the final host offset into the cluster, which
> they calculate using
> 
>    host_offset = file_cluster_offset + offset_into_cluster(s, offset)
> 
> There is no reason why Qcow2AioTask cannot store host_offset directly
> and that is what this patch does.
> 
> (*) compressed clusters are the exception: in this case what
>     file_cluster_offset was storing was the full compressed cluster
>     descriptor (offset + size). This does not change with this patch
>     but it is documented now.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 68 +++++++++++++++++++++++++--------------------------
>  1 file changed, 33 insertions(+), 35 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index d44b45633d..a00b0c8e45 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2409,8 +2410,7 @@ static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
>          }
>          qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes);
>  
> -        if (qcow2_co_encrypt(bs, file_cluster_offset + offset_in_cluster,
> -                             offset, crypt_buf, bytes) < 0)
> +        if (qcow2_co_encrypt(bs, host_offset, offset, crypt_buf, bytes) < 0)
>          {

This { should now go on the preceding line; with that fixed:

Reviewed-by: Max Reitz <mreitz@redhat.com>

>              ret = -EIO;
>              goto out_unlocked;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-03-17 18:15 ` [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() Alberto Garcia
  2020-03-18 12:08   ` Eric Blake
@ 2020-04-08 10:51   ` Max Reitz
  2020-04-08 17:29     ` Alberto Garcia
  2020-04-09  7:57     ` Vladimir Sementsov-Ogievskiy
  2020-04-09  7:50   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 2 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-08 10:51 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 3301 bytes --]

On 17.03.20 19:15, Alberto Garcia wrote:
> qcow2_get_cluster_offset() takes an (unaligned) guest offset and
> returns the (aligned) offset of the corresponding cluster in the qcow2
> image.
> 
> In practice none of the callers need to know where the cluster starts
> so this patch makes the function calculate and return the final host
> offset directly. The function is also renamed accordingly.
> 
> There is a pre-existing exception with compressed clusters: in this
> case the function returns the complete cluster descriptor (containing
> the offset and size of the compressed data). This does not change with
> this patch but it is now documented.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h         |  4 ++--
>  block/qcow2-cluster.c | 38 ++++++++++++++++++++++----------------
>  block/qcow2.c         | 24 +++++++-----------------
>  3 files changed, 31 insertions(+), 35 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 0942126232..f47ef6ca4e 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h

[...]

>      case QCOW2_CLUSTER_ZERO_ALLOC:
>      case QCOW2_CLUSTER_NORMAL:
>          /* how many allocated clusters ? */
>          c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
>                                        &l2_slice[l2_index], QCOW_OFLAG_ZERO);
> -        *cluster_offset &= L2E_OFFSET_MASK;
> -        if (offset_into_cluster(s, *cluster_offset)) {
> +        *host_offset = l2_entry & L2E_OFFSET_MASK;
> +        if (offset_into_cluster(s, *host_offset)) {
>              qcow2_signal_corruption(bs, true, -1, -1,
>                                      "Cluster allocation offset %#"
>                                      PRIx64 " unaligned (L2 offset: %#" PRIx64
> -                                    ", L2 index: %#x)", *cluster_offset,
> +                                    ", L2 index: %#x)", *host_offset,
>                                      l2_offset, l2_index);
>              ret = -EIO;
>              goto fail;
>          }
> -        if (has_data_file(bs) && *cluster_offset != offset - offset_in_cluster)
> +        if (has_data_file(bs) && *host_offset != offset - offset_in_cluster)
>          {

(1) The { should be moved to the preceding line;

(2) I think it makes more sense to move the
“*host_offset += offset_in_cluster” before this condition, so it becomes
“... && *host_offset != offset”.

>              qcow2_signal_corruption(bs, true, -1, -1,
>                                      "External data file host cluster offset %#"

(Maybe we then need to drop the “cluster” from this line, but other than
that, it would fit with this error message.)

Max

>                                      PRIx64 " does not match guest cluster "
>                                      "offset: %#" PRIx64
> -                                    ", L2 index: %#x)", *cluster_offset,
> +                                    ", L2 index: %#x)", *host_offset,
>                                      offset - offset_in_cluster, l2_index);
>              ret = -EIO;
>              goto fail;
>          }
> +        *host_offset += offset_in_cluster;
>          break;
>      default:
>          abort();


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-03-17 18:16 ` [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature Alberto Garcia
@ 2020-04-08 11:09   ` Max Reitz
  2020-04-09 15:12   ` Eric Blake
  1 sibling, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-08 11:09 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 634 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>  docs/qcow2-cache.txt   | 19 +++++++++++-
>  2 files changed, 83 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State
  2020-03-17 18:16 ` [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
@ 2020-04-08 11:12   ` Max Reitz
  2020-04-10  9:45   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-08 11:12 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 629 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> This patch adds the following new fields to BDRVQcow2State:
> 
> - subclusters_per_cluster: Number of subclusters in a cluster
> - subcluster_size: The size of each subcluster, in bytes
> - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size
> 
> Images without subclusters are treated as if they had exactly one,
> with subcluster_size = cluster_size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 5 +++++
>  block/qcow2.c | 5 +++++
>  2 files changed, 10 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  2020-03-17 18:16 ` [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
@ 2020-04-08 11:23   ` Max Reitz
  2020-04-08 17:46     ` Alberto Garcia
  2020-04-14 11:10   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 128+ messages in thread
From: Max Reitz @ 2020-04-08 11:23 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 6245 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> This patch adds QCow2SubclusterType, which is the subcluster-level
> version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
> the same meaning as their QCOW2_CLUSTER_* equivalents (when they
> exist). See below for details and caveats.
> 
> In images without extended L2 entries clusters are treated as having
> exactly one subcluster so it is possible to replace one data type with
> the other while keeping the exact same semantics.
> 
> With extended L2 entries there are new possible values, and every
> subcluster in the same cluster can obviously have a different
> QCow2SubclusterType so functions need to be adapted to work on the
> subcluster level.
> 
> There are several things that have to be taken into account:
> 
>   a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
>      compressed. We do not support compression at the subcluster
>      level.
> 
>   b) There are two different values for unallocated subclusters:
>      QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
>      cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
>      which means that the cluster is allocated but the subcluster is
>      not. The latter can only happen in images with extended L2
>      entries.
> 
>   c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
>      entry has a value that violates the specification. The caller is
>      responsible for handling these situations.
> 
>      To prevent compatibility problems with images that have invalid
>      values but are currently being read by QEMU without causing side
>      effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
>      with extended L2 entries.
> 
> qcow2_cluster_to_subcluster_type() is added as a separate function
> from qcow2_get_subcluster_type(), but this is only temporary and both
> will be merged in a subsequent patch.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 120 insertions(+)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 9611efbc52..52865787ee 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h

[...]

> @@ -447,6 +456,33 @@ typedef struct QCowL2Meta
>      QLIST_ENTRY(QCowL2Meta) next_in_flight;
>  } QCowL2Meta;
>  
> +/*
> + * In images with standard L2 entries all clusters are treated as if
> + * they had one subcluster so QCow2ClusterType and QCow2SubclusterType
> + * can be mapped to each other and have the exact same meaning
> + * (QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC cannot happen in these images).
> + *
> + * In images with extended L2 entries QCow2ClusterType refers to the
> + * complete cluster and QCow2SubclusterType to each of the individual
> + * subclusters, so there are several possible combinations:
> + *
> + *     |--------------+---------------------------|
> + *     | Cluster type | Possible subcluster types |
> + *     |--------------+---------------------------|
> + *     | UNALLOCATED  |         UNALLOCATED_PLAIN |
> + *     |              |                ZERO_PLAIN |
> + *     |--------------+---------------------------|
> + *     | NORMAL       |         UNALLOCATED_ALLOC |
> + *     |              |                ZERO_ALLOC |
> + *     |              |                    NORMAL |
> + *     |--------------+---------------------------|
> + *     | COMPRESSED   |                COMPRESSED |
> + *     |--------------+---------------------------|
> + *
> + * QCOW2_SUBCLUSTER_INVALID means that the L2 entry is incorrect and
> + * the image should be marked corrupt.
> + */
> +

Oh, a welcome addition! :)

[...]

> @@ -632,6 +678,80 @@ static inline QCow2ClusterType qcow2_get_cluster_type(BlockDriverState *bs,

[...]

> +/*
> + * In an image without subsclusters @l2_bitmap is ignored and
> + * @sc_index must be 0.
> + */
> +static inline
> +QCow2SubclusterType qcow2_get_subcluster_type(BlockDriverState *bs,
> +                                              uint64_t l2_entry,
> +                                              uint64_t l2_bitmap,
> +                                              unsigned sc_index)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    QCow2ClusterType type = qcow2_get_cluster_type(bs, l2_entry);
> +    assert(sc_index < s->subclusters_per_cluster);
> +
> +    if (has_subclusters(s)) {
> +        bool sc_zero  = l2_bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
> +        bool sc_alloc = l2_bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
> +        switch (type) {
> +        case QCOW2_CLUSTER_COMPRESSED:
> +            return QCOW2_SUBCLUSTER_COMPRESSED;

Why did you drop the check that l2_bitmap == 0 here?

Max

> +        case QCOW2_CLUSTER_ZERO_PLAIN:
> +        case QCOW2_CLUSTER_ZERO_ALLOC:
> +            return QCOW2_SUBCLUSTER_INVALID;
> +        case QCOW2_CLUSTER_NORMAL:
> +            if (!sc_zero && !sc_alloc) {
> +                return QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC;
> +            } else if (!sc_zero && sc_alloc) {
> +                return QCOW2_SUBCLUSTER_NORMAL;
> +            } else if (sc_zero && !sc_alloc) {
> +                return QCOW2_SUBCLUSTER_ZERO_ALLOC;
> +            } else { /* sc_zero && sc_alloc */
> +                return QCOW2_SUBCLUSTER_INVALID;
> +            }
> +        case QCOW2_CLUSTER_UNALLOCATED:
> +            if (!sc_zero && !sc_alloc) {
> +                return QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN;
> +            } else if (!sc_zero && sc_alloc) {
> +                return QCOW2_SUBCLUSTER_INVALID;
> +            } else if (sc_zero && !sc_alloc) {
> +                return QCOW2_SUBCLUSTER_ZERO_PLAIN;
> +            } else { /* sc_zero && sc_alloc */
> +                return QCOW2_SUBCLUSTER_INVALID;
> +            }
> +        default:
> +            g_assert_not_reached();
> +        }
> +    } else {
> +        return qcow2_cluster_to_subcluster_type(type);
> +    }
> +}
> +
>  /* Check whether refcounts are eager or lazy */
>  static inline bool qcow2_need_accurate_refcounts(BDRVQcow2State *s)
>  {
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset()
  2020-03-17 18:16 ` [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset() Alberto Garcia
@ 2020-04-08 12:15   ` Max Reitz
  2020-04-14 12:30   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-08 12:15 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 871 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> This function returns an integer that can be either an error code or a
> cluster type (a value from the QCow2ClusterType enum).
> 
> We are going to start using subcluster types instead of cluster types
> in some functions so it's better to use the exact data types instead
> of integers for clarity and in order to detect errors more easily.
> 
> This patch makes qcow2_get_host_offset() return 0 on success and
> puts the returned cluster type in a separate parameter. There are no
> semantic changes.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h         |  3 ++-
>  block/qcow2-cluster.c | 11 +++++++----
>  block/qcow2.c         | 37 ++++++++++++++++++++++---------------
>  3 files changed, 31 insertions(+), 20 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  2020-03-17 18:16 ` [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
@ 2020-04-08 12:42   ` Max Reitz
  2020-04-15  7:10   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-08 12:42 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 800 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> In order to support extended L2 entries some functions of the qcow2
> driver need to start dealing with subclusters instead of clusters.
> 
> qcow2_get_host_offset() is modified to return the subcluster type
> instead of the cluster type, and all callers are updated to replace
> all values of QCow2ClusterType with their QCow2SubclusterType
> equivalents.
> 
> This patch only changes the data types, there are no semantic changes.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h         |  2 +-
>  block/qcow2-cluster.c | 10 +++----
>  block/qcow2.c         | 70 ++++++++++++++++++++++---------------------
>  3 files changed, 42 insertions(+), 40 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset()
  2020-03-17 18:16 ` [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset() Alberto Garcia
@ 2020-04-08 12:49   ` Max Reitz
  2020-04-08 17:35     ` Alberto Garcia
  2020-04-22  8:07   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 128+ messages in thread
From: Max Reitz @ 2020-04-08 12:49 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 664 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> The logic of this function remains pretty much the same, except that
> it uses count_contiguous_subclusters(), which combines the logic of
> count_contiguous_clusters() / count_contiguous_clusters_unallocated()
> and checks individual subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.h         |  38 +++++------
>  block/qcow2-cluster.c | 143 +++++++++++++++++++++---------------------
>  2 files changed, 85 insertions(+), 96 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

(Oops, totally missed the L1 entry out of bounds / L1 entry empty part
in v3.)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-04-08 10:51   ` Max Reitz
@ 2020-04-08 17:29     ` Alberto Garcia
  2020-04-09  7:57     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-08 17:29 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Wed 08 Apr 2020 12:51:24 PM CEST, Max Reitz wrote:
>> -        if (has_data_file(bs) && *cluster_offset != offset - offset_in_cluster)
>> +        if (has_data_file(bs) && *host_offset != offset - offset_in_cluster)
>>          {
>
> (1) The { should be moved to the preceding line;
>
> (2) I think it makes more sense to move the
> “*host_offset += offset_in_cluster” before this condition, so it becomes
> “... && *host_offset != offset”.
>
>>              qcow2_signal_corruption(bs, true, -1, -1,
>>                                      "External data file host cluster offset %#"
>
> (Maybe we then need to drop the “cluster” from this line, but other than
> that, it would fit with this error message.)

The reason why I have “*host_offset += offset_in_cluster” after the
condition is precisely to keep the cluster-aligned offset in the error
message. But of course I could also use start_of_cluster() or similar in
the error message.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset()
  2020-04-08 12:49   ` Max Reitz
@ 2020-04-08 17:35     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-08 17:35 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Wed 08 Apr 2020 02:49:14 PM CEST, Max Reitz wrote:
> (Oops, totally missed the L1 entry out of bounds / L1 entry empty part
> in v3.)

Yeah, and you can mix values between different enum types in C quite
easily without the compiler producing a warning.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  2020-04-08 11:23   ` Max Reitz
@ 2020-04-08 17:46     ` Alberto Garcia
  2020-04-09  8:22       ` Max Reitz
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-08 17:46 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Wed 08 Apr 2020 01:23:42 PM CEST, Max Reitz wrote:
>> +        switch (type) {
>> +        case QCOW2_CLUSTER_COMPRESSED:
>> +            return QCOW2_SUBCLUSTER_COMPRESSED;
>
> Why did you drop the check that l2_bitmap == 0 here?

We don't generally check that reserved bits are 0. It would for example
allow us to add a new compatible feature in the future using those bits.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset
  2020-03-17 18:15 ` [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
  2020-03-18 11:23   ` Eric Blake
  2020-04-08 10:23   ` Max Reitz
@ 2020-04-09  6:49   ` Vladimir Sementsov-Ogievskiy
  2 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-09  6:49 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:15, Alberto Garcia wrote:
> The file_cluster_offset field of Qcow2AioTask stores a cluster-aligned
> host offset. In practice this is not very useful because all users(*)
> of this structure need the final host offset into the cluster, which
> they calculate using
> 
>     host_offset = file_cluster_offset + offset_into_cluster(s, offset)
> 
> There is no reason why Qcow2AioTask cannot store host_offset directly
> and that is what this patch does.
> 
> (*) compressed clusters are the exception: in this case what
>      file_cluster_offset was storing was the full compressed cluster
>      descriptor (offset + size). This does not change with this patch
>      but it is documented now.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2.c | 68 +++++++++++++++++++++++++--------------------------
>   1 file changed, 33 insertions(+), 35 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index d44b45633d..a00b0c8e45 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -74,7 +74,7 @@ typedef struct {
>   
>   static int coroutine_fn
>   qcow2_co_preadv_compressed(BlockDriverState *bs,
> -                           uint64_t file_cluster_offset,
> +                           uint64_t cluster_descriptor,
>                              uint64_t offset,
>                              uint64_t bytes,
>                              QEMUIOVector *qiov,
> @@ -2041,7 +2041,7 @@ out:
>   
>   static coroutine_fn int
>   qcow2_co_preadv_encrypted(BlockDriverState *bs,
> -                           uint64_t file_cluster_offset,
> +                           uint64_t host_offset,
>                              uint64_t offset,
>                              uint64_t bytes,
>                              QEMUIOVector *qiov,
> @@ -2068,16 +2068,12 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
>       }
>   
>       BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
> -    ret = bdrv_co_pread(s->data_file,
> -                        file_cluster_offset + offset_into_cluster(s, offset),
> -                        bytes, buf, 0);
> +    ret = bdrv_co_pread(s->data_file, host_offset, bytes, buf, 0);
>       if (ret < 0) {
>           goto fail;
>       }
>   
> -    if (qcow2_co_decrypt(bs,
> -                         file_cluster_offset + offset_into_cluster(s, offset),
> -                         offset, buf, bytes) < 0)
> +    if (qcow2_co_decrypt(bs, host_offset, offset, buf, bytes) < 0)
>       {
>           ret = -EIO;
>           goto fail;
> @@ -2095,7 +2091,7 @@ typedef struct Qcow2AioTask {
>   
>       BlockDriverState *bs;
>       QCow2ClusterType cluster_type; /* only for read */
> -    uint64_t file_cluster_offset;
> +    uint64_t host_offset; /* or full descriptor in compressed clusters */
>       uint64_t offset;
>       uint64_t bytes;
>       QEMUIOVector *qiov;
> @@ -2108,7 +2104,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
>                                          AioTaskPool *pool,
>                                          AioTaskFunc func,
>                                          QCow2ClusterType cluster_type,
> -                                       uint64_t file_cluster_offset,
> +                                       uint64_t host_offset,
>                                          uint64_t offset,
>                                          uint64_t bytes,
>                                          QEMUIOVector *qiov,
> @@ -2123,7 +2119,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
>           .bs = bs,
>           .cluster_type = cluster_type,
>           .qiov = qiov,
> -        .file_cluster_offset = file_cluster_offset,
> +        .host_offset = host_offset,
>           .offset = offset,
>           .bytes = bytes,
>           .qiov_offset = qiov_offset,
> @@ -2132,7 +2128,7 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
>   
>       trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
>                            func == qcow2_co_preadv_task_entry ? "read" : "write",
> -                         cluster_type, file_cluster_offset, offset, bytes,
> +                         cluster_type, host_offset, offset, bytes,

Please, update also the trace-point in block/trace-events

>                            qiov, qiov_offset);
>   
>       if (!pool) {
> @@ -2146,13 +2142,12 @@ static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
>   


Maybe, add comment
/* host_offset: host offset, or cluster descriptor for compressed cluster */
>   static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
>                                                QCow2ClusterType cluster_type,
> -                                             uint64_t file_cluster_offset,
> +                                             uint64_t host_offset,
>                                                uint64_t offset, uint64_t bytes,
>                                                QEMUIOVector *qiov,
>                                                size_t qiov_offset)
>   {
>       BDRVQcow2State *s = bs->opaque;
> -    int offset_in_cluster = offset_into_cluster(s, offset);
>   
>       switch (cluster_type) {
>       case QCOW2_CLUSTER_ZERO_PLAIN:
> @@ -2168,19 +2163,17 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
>                                      qiov, qiov_offset, 0);
>   
>       case QCOW2_CLUSTER_COMPRESSED:
> -        return qcow2_co_preadv_compressed(bs, file_cluster_offset,
> +        return qcow2_co_preadv_compressed(bs, host_offset,
>                                             offset, bytes, qiov, qiov_offset);
>   
>       case QCOW2_CLUSTER_NORMAL:
> -        assert(offset_into_cluster(s, file_cluster_offset) == 0);
>           if (bs->encrypted) {
> -            return qcow2_co_preadv_encrypted(bs, file_cluster_offset,
> +            return qcow2_co_preadv_encrypted(bs, host_offset,
>                                                offset, bytes, qiov, qiov_offset);
>           }
>   
>           BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
> -        return bdrv_co_preadv_part(s->data_file,
> -                                   file_cluster_offset + offset_in_cluster,
> +        return bdrv_co_preadv_part(s->data_file, host_offset,
>                                      bytes, qiov, qiov_offset, 0);
>   
>       default:
> @@ -2196,7 +2189,7 @@ static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
>   
>       assert(!t->l2meta);
>   
> -    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->file_cluster_offset,
> +    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->host_offset,
>                                   t->offset, t->bytes, t->qiov, t->qiov_offset);
>   }
>   
> @@ -2232,11 +2225,20 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
>           {
>               qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
>           } else {
> +            /*
> +             * For compressed clusters the variable cluster_offset
> +             * does not actually store the offset but the full
> +             * descriptor. We need to leave it unchanged because
> +             * that's what qcow2_co_preadv_compressed() expects.
> +             */

Hmm, good to document it for qcow2_get_cluster_offset function. May be you did it in next patch.

With at least updated trace-event:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-03-17 18:15 ` [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() Alberto Garcia
  2020-03-18 12:08   ` Eric Blake
  2020-04-08 10:51   ` Max Reitz
@ 2020-04-09  7:50   ` Vladimir Sementsov-Ogievskiy
  2020-04-09 14:45     ` Alberto Garcia
  2 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-09  7:50 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:15, Alberto Garcia wrote:
> qcow2_get_cluster_offset() takes an (unaligned) guest offset and
> returns the (aligned) offset of the corresponding cluster in the qcow2
> image.
> 
> In practice none of the callers need to know where the cluster starts
> so this patch makes the function calculate and return the final host
> offset directly. The function is also renamed accordingly.

Great that you rename functions and variables which change their behavior, it simplifies reviewing!

> 
> There is a pre-existing exception with compressed clusters: in this
> case the function returns the complete cluster descriptor (containing
> the offset and size of the compressed data). This does not change with
> this patch but it is now documented.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

[..]

> -int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
> -                             unsigned int *bytes, uint64_t *cluster_offset)
> +int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
> +                          unsigned int *bytes, uint64_t *host_offset)
>   {
>       BDRVQcow2State *s = bs->opaque;
>       unsigned int l2_index;
> -    uint64_t l1_index, l2_offset, *l2_slice;
> +    uint64_t l1_index, l2_offset, *l2_slice, l2_entry;
>       int c;
>       unsigned int offset_in_cluster;
>       uint64_t bytes_available, bytes_needed, nb_clusters;
> @@ -537,7 +542,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>           bytes_needed = bytes_available;
>       }
>   
> -    *cluster_offset = 0;
> +    *host_offset = 0;
>   
>       /* seek to the l2 offset in the l1 table */
>   
> @@ -570,7 +575,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>       /* find the cluster offset for the given disk offset */
>   
>       l2_index = offset_to_l2_slice_index(s, offset);
> -    *cluster_offset = be64_to_cpu(l2_slice[l2_index]);
> +    l2_entry = be64_to_cpu(l2_slice[l2_index]);
>   
>       nb_clusters = size_to_clusters(s, bytes_needed);
>       /* bytes_needed <= *bytes + offset_in_cluster, both of which are unsigned
> @@ -578,7 +583,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>        * true */
>       assert(nb_clusters <= INT_MAX);
>   
> -    type = qcow2_get_cluster_type(bs, *cluster_offset);
> +    type = qcow2_get_cluster_type(bs, l2_entry);
>       if (s->qcow_version < 3 && (type == QCOW2_CLUSTER_ZERO_PLAIN ||
>                                   type == QCOW2_CLUSTER_ZERO_ALLOC)) {
>           qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
> @@ -599,41 +604,42 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>           }
>           /* Compressed clusters can only be processed one by one */
>           c = 1;
> -        *cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
> +        *host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
>           break;
>       case QCOW2_CLUSTER_ZERO_PLAIN:
>       case QCOW2_CLUSTER_UNALLOCATED:
>           /* how many empty clusters ? */
>           c = count_contiguous_clusters_unallocated(bs, nb_clusters,
>                                                     &l2_slice[l2_index], type);
> -        *cluster_offset = 0;
> +        *host_offset = 0;

Actually, dead assignment now.. But I feel that better to keep it.

Hmm. May be, drop the first assignment of zero to host_offset? We actually don't need it, user should not rely on host_offset if we return an error.

>           break;
>       case QCOW2_CLUSTER_ZERO_ALLOC:
>       case QCOW2_CLUSTER_NORMAL:
>           /* how many allocated clusters ? */
>           c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
>                                         &l2_slice[l2_index], QCOW_OFLAG_ZERO);
> -        *cluster_offset &= L2E_OFFSET_MASK;
> -        if (offset_into_cluster(s, *cluster_offset)) {
> +        *host_offset = l2_entry & L2E_OFFSET_MASK;
> +        if (offset_into_cluster(s, *host_offset)) {
>               qcow2_signal_corruption(bs, true, -1, -1,
>                                       "Cluster allocation offset %#"
>                                       PRIx64 " unaligned (L2 offset: %#" PRIx64
> -                                    ", L2 index: %#x)", *cluster_offset,
> +                                    ", L2 index: %#x)", *host_offset,
>                                       l2_offset, l2_index);
>               ret = -EIO;
>               goto fail;
>           }
> -        if (has_data_file(bs) && *cluster_offset != offset - offset_in_cluster)

[..]

> @@ -3735,7 +3726,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>           offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
>           bytes = s->cluster_size;

Unrelated to the patch, but.. Why we change bytes?? So, we can finish with success, but zero-out only first cluster?

Ah, found, generic block-layer take care of it and never issue unaligned requests crossing cluster boundary.

>           nr = s->cluster_size;
> -        ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
> +        ret = qcow2_get_host_offset(bs, offset, &nr, &off);
>           if (ret != QCOW2_CLUSTER_UNALLOCATED &&
>               ret != QCOW2_CLUSTER_ZERO_PLAIN &&
>               ret != QCOW2_CLUSTER_ZERO_ALLOC) {
> @@ -3800,7 +3791,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>           cur_bytes = MIN(bytes, INT_MAX);
>           cur_write_flags = write_flags;
>   
> -        ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, &copy_offset);
> +        ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes, &copy_offset);
>           if (ret < 0) {
>               goto out;
>           }
> @@ -3832,7 +3823,6 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>   
>           case QCOW2_CLUSTER_NORMAL:
>               child = s->data_file;
> -            copy_offset += offset_into_cluster(s, src_offset);
>               break;
>   
>           default:
> 

with or without first assignment dropped:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-04-08 10:51   ` Max Reitz
  2020-04-08 17:29     ` Alberto Garcia
@ 2020-04-09  7:57     ` Vladimir Sementsov-Ogievskiy
  2020-04-09 14:35       ` Alberto Garcia
  1 sibling, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-09  7:57 UTC (permalink / raw)
  To: Max Reitz, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block

08.04.2020 13:51, Max Reitz wrote:
> On 17.03.20 19:15, Alberto Garcia wrote:
>> qcow2_get_cluster_offset() takes an (unaligned) guest offset and
>> returns the (aligned) offset of the corresponding cluster in the qcow2
>> image.
>>
>> In practice none of the callers need to know where the cluster starts
>> so this patch makes the function calculate and return the final host
>> offset directly. The function is also renamed accordingly.
>>
>> There is a pre-existing exception with compressed clusters: in this
>> case the function returns the complete cluster descriptor (containing
>> the offset and size of the compressed data). This does not change with
>> this patch but it is now documented.
>>
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   block/qcow2.h         |  4 ++--
>>   block/qcow2-cluster.c | 38 ++++++++++++++++++++++----------------
>>   block/qcow2.c         | 24 +++++++-----------------
>>   3 files changed, 31 insertions(+), 35 deletions(-)
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 0942126232..f47ef6ca4e 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
> 
> [...]
> 
>>       case QCOW2_CLUSTER_ZERO_ALLOC:
>>       case QCOW2_CLUSTER_NORMAL:
>>           /* how many allocated clusters ? */
>>           c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
>>                                         &l2_slice[l2_index], QCOW_OFLAG_ZERO);
>> -        *cluster_offset &= L2E_OFFSET_MASK;
>> -        if (offset_into_cluster(s, *cluster_offset)) {
>> +        *host_offset = l2_entry & L2E_OFFSET_MASK;
>> +        if (offset_into_cluster(s, *host_offset)) {
>>               qcow2_signal_corruption(bs, true, -1, -1,
>>                                       "Cluster allocation offset %#"
>>                                       PRIx64 " unaligned (L2 offset: %#" PRIx64
>> -                                    ", L2 index: %#x)", *cluster_offset,
>> +                                    ", L2 index: %#x)", *host_offset,
>>                                       l2_offset, l2_index);
>>               ret = -EIO;
>>               goto fail;
>>           }
>> -        if (has_data_file(bs) && *cluster_offset != offset - offset_in_cluster)
>> +        if (has_data_file(bs) && *host_offset != offset - offset_in_cluster)
>>           {
> 
> (1) The { should be moved to the preceding line;
> 
> (2) I think it makes more sense to move the
> “*host_offset += offset_in_cluster” before this condition, so it becomes
> “... && *host_offset != offset”.
> 
>>               qcow2_signal_corruption(bs, true, -1, -1,
>>                                       "External data file host cluster offset %#"
> 
> (Maybe we then need to drop the “cluster” from this line, but other than
> that, it would fit with this error message.)
> 

Message would be less useful I think, better is compare two cluster offsets, as host cluster offset is specified by qcow2 metadata, not host offset.

What about squashing this:

--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -615,32 +615,34 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
          break;
      case QCOW2_CLUSTER_ZERO_ALLOC:
      case QCOW2_CLUSTER_NORMAL:
+    {
+        uint64_t host_cluster_offset = l2_slice & L2E_OFFSET_MASK;
+        *host_offset = host_cluster_offset + offset_in_cluster;
+
          /* how many allocated clusters ? */
          c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
                                        &l2_slice[l2_index], QCOW_OFLAG_ZERO);
-        *host_offset = l2_entry & L2E_OFFSET_MASK;
-        if (offset_into_cluster(s, *host_offset)) {
+        if (offset_into_cluster(s, host_cluster_offset)) {
              qcow2_signal_corruption(bs, true, -1, -1,
                                      "Cluster allocation offset %#"
                                      PRIx64 " unaligned (L2 offset: %#" PRIx64
-                                    ", L2 index: %#x)", *host_offset,
+                                    ", L2 index: %#x)", host_cluster_offset,
                                      l2_offset, l2_index);
              ret = -EIO;
              goto fail;
          }
-        if (has_data_file(bs) && *host_offset != offset - offset_in_cluster)
-        {
+        if (has_data_file(bs) && *host_offset != offset) {
              qcow2_signal_corruption(bs, true, -1, -1,
                                      "External data file host cluster offset %#"
                                      PRIx64 " does not match guest cluster "
                                      "offset: %#" PRIx64
-                                    ", L2 index: %#x)", *host_offset,
+                                    ", L2 index: %#x)", host_cluster_offset,
                                      offset - offset_in_cluster, l2_index);
              ret = -EIO;
              goto fail;
          }
-        *host_offset += offset_in_cluster;
          break;
+    }
      default:
          abort();
      }



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  2020-04-08 17:46     ` Alberto Garcia
@ 2020-04-09  8:22       ` Max Reitz
  0 siblings, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-09  8:22 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 679 bytes --]

On 08.04.20 19:46, Alberto Garcia wrote:
> On Wed 08 Apr 2020 01:23:42 PM CEST, Max Reitz wrote:
>>> +        switch (type) {
>>> +        case QCOW2_CLUSTER_COMPRESSED:
>>> +            return QCOW2_SUBCLUSTER_COMPRESSED;
>>
>> Why did you drop the check that l2_bitmap == 0 here?
> 
> We don't generally check that reserved bits are 0. It would for example
> allow us to add a new compatible feature in the future using those bits.

OK.  The spec as you wrote it would allow that, and if we ever used
those bits we’d probably need to add a feature bit to the header anyway.
 (More so if we returned an error here.)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 03/30] qcow2: Add calculate_l2_meta()
  2020-03-17 18:16 ` [PATCH v4 03/30] qcow2: Add calculate_l2_meta() Alberto Garcia
@ 2020-04-09  8:30   ` Vladimir Sementsov-Ogievskiy
  2020-04-09 15:12     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-09  8:30 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> handle_alloc() creates a QCowL2Meta structure in order to update the
> image metadata and perform the necessary copy-on-write operations.
> 
> This patch moves that code to a separate function so it can be used
> from other places.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c | 77 +++++++++++++++++++++++++++++--------------
>   1 file changed, 53 insertions(+), 24 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 95f04d12cc..802fc599a5 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1039,6 +1039,56 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>       }
>   }
>   
> +/*
> + * For a given write request, create a new QCowL2Meta structure, add
> + * it to @m and the BDRVQcow2State.cluster_allocs list.
> + *
> + * @host_cluster_offset points to the beginning of the first cluster.
> + *
> + * @guest_offset and @bytes indicate the offset and length of the
> + * request.
> + *
> + * If @keep_old is true it means that the clusters were already
> + * allocated and will be overwritten. If false then the clusters are
> + * new and we have to decrease the reference count of the old ones.
> + */
> +static void calculate_l2_meta(BlockDriverState *bs,
> +                              uint64_t host_cluster_offset,
> +                              uint64_t guest_offset, unsigned bytes,
> +                              QCowL2Meta **m, bool keep_old)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    unsigned cow_start_from = 0;
> +    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
> +    unsigned cow_end_from = cow_start_to + bytes;
> +    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
> +    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
> +    QCowL2Meta *old_m = *m;
> +
> +    *m = g_malloc0(sizeof(**m));
> +    **m = (QCowL2Meta) {
> +        .next           = old_m,
> +
> +        .alloc_offset   = host_cluster_offset,
> +        .offset         = start_of_cluster(s, guest_offset),
> +        .nb_clusters    = nb_clusters,
> +
> +        .keep_old_clusters = keep_old,
> +
> +        .cow_start = {
> +            .offset     = cow_start_from,
> +            .nb_bytes   = cow_start_to - cow_start_from,
> +        },
> +        .cow_end = {
> +            .offset     = cow_end_from,

Hmm. So, you make it equal to requested_bytes from handle_alloc(). But before your change it was MIN(requested_bytes, avail_bytes).. If avail_bytes can be less than requested_bytes the patch breaks it, if not, we'd better drop this MIN.

> +            .nb_bytes   = cow_end_to - cow_end_from,
> +        },
> +    };
> +
> +    qemu_co_queue_init(&(*m)->dependent_requests);
> +    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
> +}
> +
>   /*
>    * Returns the number of contiguous clusters that can be used for an allocating
>    * write, but require COW to be performed (this includes yet unallocated space,
> @@ -1437,35 +1487,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>       uint64_t requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
>       int avail_bytes = nb_clusters << s->cluster_bits;
>       int nb_bytes = MIN(requested_bytes, avail_bytes);
> -    QCowL2Meta *old_m = *m;
> -
> -    *m = g_malloc0(sizeof(**m));
> -
> -    **m = (QCowL2Meta) {
> -        .next           = old_m,
> -
> -        .alloc_offset   = alloc_cluster_offset,
> -        .offset         = start_of_cluster(s, guest_offset),
> -        .nb_clusters    = nb_clusters,
> -
> -        .keep_old_clusters  = keep_old_clusters,
> -
> -        .cow_start = {
> -            .offset     = 0,
> -            .nb_bytes   = offset_into_cluster(s, guest_offset),
> -        },
> -        .cow_end = {
> -            .offset     = nb_bytes,
> -            .nb_bytes   = avail_bytes - nb_bytes,
> -        },
> -    };
> -    qemu_co_queue_init(&(*m)->dependent_requests);
> -    QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>   
>       *host_offset = alloc_cluster_offset + offset_into_cluster(s, guest_offset);
>       *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>       assert(*bytes != 0);
>   
> +    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
> +                      m, keep_old_clusters);
> +
>       return 1;
>   
>   fail:
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-03-17 18:16 ` [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
@ 2020-04-09 10:05   ` Max Reitz
  2020-04-10 12:47     ` Alberto Garcia
  2020-04-22 11:35   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 128+ messages in thread
From: Max Reitz @ 2020-04-09 10:05 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 2169 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> Two changes are needed in this function:
> 
> 1) A full discard deallocates a cluster so we can skip the operation if
>    it is already unallocated. With extended L2 entries however if any
>    of the subclusters has the 'all zeroes' bit set then we have to
>    clear it.
> 
> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>    image has extended L2 entries. Instead, the individual 'all zeroes'
>    bits must be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 746006a117..824c710760 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>           * TODO We might want to use bdrv_block_status(bs) here, but we're
>           * holding s->lock, so that doesn't work today.
>           *
> -         * If full_discard is true, the sector should not read back as zeroes,
> +         * If full_discard is true, the cluster should not read back as zeroes,
>           * but rather fall through to the backing file.
>           */
>          switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>          case QCOW2_CLUSTER_UNALLOCATED:
> -            if (full_discard || !bs->backing) {
> +            if (full_discard) {
> +                /* If the image has extended L2 entries we can only
> +                 * skip this operation if the L2 bitmap is zero. */
> +                uint64_t bitmap = has_subclusters(s) ?
> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;

Isn’t this bitmap only valid for standard clusters?  In this case, the
whole cluster is unallocated, so the bitmap shouldn’t be relevant, AFAIU.

Max

> +                if (bitmap == 0) {
> +                    continue;
> +                }
> +            } else if (!bs->backing) {
>                  continue;
>              }
>              break;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 27/30] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters
  2020-03-17 18:16 ` [PATCH v4 27/30] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters Alberto Garcia
@ 2020-04-09 10:27   ` Max Reitz
  2020-04-10 16:42     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Max Reitz @ 2020-04-09 10:27 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1789 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> This function is only used by qcow2_expand_zero_clusters() to
> downgrade a qcow2 image to a previous version. It is however not
> possible to downgrade an image with extended L2 entries because older
> versions of qcow2 do not have this feature.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c      | 8 +++++++-
>  tests/qemu-iotests/061     | 6 ++++++
>  tests/qemu-iotests/061.out | 5 +++++
>  3 files changed, 18 insertions(+), 1 deletion(-)

[...]

> diff --git a/tests/qemu-iotests/061 b/tests/qemu-iotests/061
> index 36b040491f..66bfd23179 100755
> --- a/tests/qemu-iotests/061
> +++ b/tests/qemu-iotests/061
> @@ -266,6 +266,12 @@ $QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
>  _img_info --format-specific
>  _check_test_img
>  
> +echo
> +echo "=== Testing version downgrade with extended L2 entries ==="
> +echo
> +_make_test_img -o "compat=1.1,extended_l2=on" 64M
> +$QEMU_IMG amend -o "compat=0.10" "$TEST_IMG"
> +
>  echo
>  echo "=== Try changing the external data file ==="
>  echo
> diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
> index 8b3091a412..5d009867a2 100644
> --- a/tests/qemu-iotests/061.out
> +++ b/tests/qemu-iotests/061.out
> @@ -498,6 +498,11 @@ Format specific information:
>      corrupt: false
>  No errors were found on the image.
>  
> +=== Testing version downgrade with extended L2 entries ===
> +
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
> +qemu-img: Cannot downgrade an image with incompatible features 0x10 set

This test fails in this commit, because extended_l2 is only available
after the next commit.  The code changes and the test itself look good
to me, though.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 05/30] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2020-03-17 18:16 ` [PATCH v4 05/30] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
@ 2020-04-09 10:59   ` Vladimir Sementsov-Ogievskiy
  2020-04-09 16:08     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-09 10:59 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

I'm sorry that I'm joining only now and may ask questions already discussed in previous versions :(

17.03.2020 21:16, Alberto Garcia wrote:
> When writing to a qcow2 file there are two functions that take a
> virtual offset and return a host offset, possibly allocating new
> clusters if necessary:
> 
>     - handle_copied() looks for normal data clusters that are already
>       allocated and have a reference count of 1. In those clusters we
>       can simply write the data and there is no need to perform any
>       copy-on-write.
> 
>     - handle_alloc() looks for clusters that do need copy-on-write,
>       either because they haven't been allocated yet, because their
>       reference count is != 1 or because they are ZERO_ALLOC clusters.
> 
> The ZERO_ALLOC case is a bit special because those are clusters that
> are already allocated and they could perfectly be dealt with in
> handle_copied() (as long as copy-on-write is performed when required).
> 
> In fact, there is extra code specifically for them in handle_alloc()
> that tries to reuse the existing allocation if possible and frees them
> otherwise.
> 
> This patch changes the handling of ZERO_ALLOC clusters so the
> semantics of these two functions are now like this:
> 
>     - handle_copied() looks for clusters that are already allocated and
>       which we can overwrite (NORMAL and ZERO_ALLOC clusters with a
>       reference count of 1).
> 
>     - handle_alloc() looks for clusters for which we need a new
>       allocation (all other cases).
> 
> One important difference after this change is that clusters found
> in handle_copied() may now require copy-on-write, but this will be
> necessary anyway once we add support for subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c | 230 ++++++++++++++++++++++++------------------
>   1 file changed, 130 insertions(+), 100 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e251d00890..5c81046c34 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1041,13 +1041,18 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>   
>   /*
>    * For a given write request, create a new QCowL2Meta structure, add
> - * it to @m and the BDRVQcow2State.cluster_allocs list.
> + * it to @m and the BDRVQcow2State.cluster_allocs list. If the write
> + * request does not need copy-on-write or changes to the L2 metadata
> + * then this function does nothing.
>    *
>    * @host_cluster_offset points to the beginning of the first cluster.
>    *
>    * @guest_offset and @bytes indicate the offset and length of the
>    * request.
>    *
> + * @l2_slice contains the L2 entries of all clusters involved in this
> + * write request.
> + *
>    * If @keep_old is true it means that the clusters were already
>    * allocated and will be overwritten. If false then the clusters are
>    * new and we have to decrease the reference count of the old ones.
> @@ -1055,15 +1060,53 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>   static void calculate_l2_meta(BlockDriverState *bs,
>                                 uint64_t host_cluster_offset,
>                                 uint64_t guest_offset, unsigned bytes,
> -                              QCowL2Meta **m, bool keep_old)
> +                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
>   {
>       BDRVQcow2State *s = bs->opaque;
> -    unsigned cow_start_from = 0;
> +    int l2_index = offset_to_l2_slice_index(s, guest_offset);
> +    uint64_t l2_entry;
> +    unsigned cow_start_from, cow_end_to;
>       unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>       unsigned cow_end_from = cow_start_to + bytes;
> -    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>       unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>       QCowL2Meta *old_m = *m;
> +    QCow2ClusterType type;
> +
> +    assert(nb_clusters <= s->l2_slice_size - l2_index);
> +
> +    /* Return if there's no COW (all clusters are normal and we keep them) */
> +    if (keep_old) {
> +        int i;
> +        for (i = 0; i < nb_clusters; i++) {
> +            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
> +            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {

Could we also allow full ZERO_ALLOC clusters here?

> +                break;
> +            }
> +        }
> +        if (i == nb_clusters) {
> +            return;
> +        }
> +    }
> +
> +    /* Get the L2 entry of the first cluster */
> +    l2_entry = be64_to_cpu(l2_slice[l2_index]);
> +    type = qcow2_get_cluster_type(bs, l2_entry);
> +
> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> +        cow_start_from = cow_start_to;
> +    } else {
> +        cow_start_from = 0;
> +    }
> +
> +    /* Get the L2 entry of the last cluster */
> +    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
> +    type = qcow2_get_cluster_type(bs, l2_entry);
> +
> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> +        cow_end_to = cow_end_from;
> +    } else {
> +        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
> +    }

These two ifs may be moved into if (keep_old), and drop "&& keep_old" from conditions.
This also will allow to drop extra calculations, move new variables to if (keep_old) {} block and allow to pass l2_slice=NULL together with keep_old=false.


>   
>       *m = g_malloc0(sizeof(**m));
>       **m = (QCowL2Meta) {
> @@ -1089,18 +1132,22 @@ static void calculate_l2_meta(BlockDriverState *bs,
>       QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
>   }
>   
> -/* Returns true if writing to a cluster requires COW */
> -static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
> +/*
> + * Returns true if writing to the cluster pointed to by @l2_entry
> + * requires a new allocation (that is, if the cluster is unallocated
> + * or has refcount > 1 and therefore cannot be written in-place).
> + */
> +static bool cluster_needs_new_alloc(BlockDriverState *bs, uint64_t l2_entry)
>   {
>       switch (qcow2_get_cluster_type(bs, l2_entry)) {
>       case QCOW2_CLUSTER_NORMAL:
> +    case QCOW2_CLUSTER_ZERO_ALLOC:
>           if (l2_entry & QCOW_OFLAG_COPIED) {
>               return false;
>           }
>       case QCOW2_CLUSTER_UNALLOCATED:
>       case QCOW2_CLUSTER_COMPRESSED:
>       case QCOW2_CLUSTER_ZERO_PLAIN:
> -    case QCOW2_CLUSTER_ZERO_ALLOC:
>           return true;
>       default:
>           abort();
> @@ -1108,20 +1155,38 @@ static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
>   }
>   
>   /*
> - * Returns the number of contiguous clusters that can be used for an allocating
> - * write, but require COW to be performed (this includes yet unallocated space,
> - * which must copy from the backing file)
> + * Returns the number of contiguous clusters that can be written to
> + * using one single write request, starting from @l2_index.
> + * At most @nb_clusters are checked.
> + *
> + * If @new_alloc is true this counts clusters that are either
> + * unallocated, or allocated but with refcount > 1 (so they need to be
> + * newly allocated and COWed).
> + *
> + * If @new_alloc is false this counts clusters that are already
> + * allocated and can be overwritten in-place (this includes clusters
> + * of type QCOW2_CLUSTER_ZERO_ALLOC).
>    */
> -static int count_cow_clusters(BlockDriverState *bs, int nb_clusters,
> -    uint64_t *l2_slice, int l2_index)
> +static int count_single_write_clusters(BlockDriverState *bs, int nb_clusters,
> +                                       uint64_t *l2_slice, int l2_index,
> +                                       bool new_alloc)
>   {
> +    BDRVQcow2State *s = bs->opaque;
> +    uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index]);
> +    uint64_t expected_offset = l2_entry & L2E_OFFSET_MASK;
>       int i;
>   
>       for (i = 0; i < nb_clusters; i++) {
> -        uint64_t l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
> -        if (!cluster_needs_cow(bs, l2_entry)) {
> +        l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
> +        if (cluster_needs_new_alloc(bs, l2_entry) != new_alloc) {
>               break;
>           }
> +        if (!new_alloc) {
> +            if (expected_offset != (l2_entry & L2E_OFFSET_MASK)) {
> +                break;
> +            }
> +            expected_offset += s->cluster_size;
> +        }
>       }
>   
>       assert(i <= nb_clusters);
> @@ -1192,10 +1257,10 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
>   }
>   
>   /*
> - * Checks how many already allocated clusters that don't require a copy on
> - * write there are at the given guest_offset (up to *bytes). If *host_offset is
> - * not INV_OFFSET, only physically contiguous clusters beginning at this host
> - * offset are counted.
> + * Checks how many already allocated clusters that don't require a new
> + * allocation there are at the given guest_offset (up to *bytes).
> + * If *host_offset is not INV_OFFSET, only physically contiguous clusters
> + * beginning at this host offset are counted.
>    *
>    * Note that guest_offset may not be cluster aligned. In this case, the
>    * returned *host_offset points to exact byte referenced by guest_offset and
> @@ -1204,12 +1269,12 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
>    * Returns:
>    *   0:     if no allocated clusters are available at the given offset.
>    *          *bytes is normally unchanged. It is set to 0 if the cluster
> - *          is allocated and doesn't need COW, but doesn't have the right
> - *          physical offset.
> + *          is allocated and can be overwritten in-place but doesn't have
> + *          the right physical offset.
>    *
> - *   1:     if allocated clusters that don't require a COW are available at
> - *          the requested offset. *bytes may have decreased and describes
> - *          the length of the area that can be written to.
> + *   1:     if allocated clusters that can be overwritten in place are
> + *          available at the requested offset. *bytes may have decreased
> + *          and describes the length of the area that can be written to.
>    *
>    *  -errno: in error cases
>    */
> @@ -1239,7 +1304,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>   
>       l2_index = offset_to_l2_slice_index(s, guest_offset);
>       nb_clusters = MIN(nb_clusters, s->l2_slice_size - l2_index);
> -    assert(nb_clusters <= INT_MAX);
> +    /* Limit total byte count to BDRV_REQUEST_MAX_BYTES */
> +    nb_clusters = MIN(nb_clusters, BDRV_REQUEST_MAX_BYTES >> s->cluster_bits);
>   
>       /* Find L2 entry for the first involved cluster */
>       ret = get_cluster_table(bs, guest_offset, &l2_slice, &l2_index);
> @@ -1249,18 +1315,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>   
>       cluster_offset = be64_to_cpu(l2_slice[l2_index]);

It would be good to s/cluster_offset/l2_entry/

And, "cluster_offset & L2E_OFFSET_MASK" is used so many times, so, I'd not substitute, but
keep both variables: l2_entry and cluster_offset..

>   
> -    /* Check how many clusters are already allocated and don't need COW */
> -    if (qcow2_get_cluster_type(bs, cluster_offset) == QCOW2_CLUSTER_NORMAL
> -        && (cluster_offset & QCOW_OFLAG_COPIED))
> -    {
> +    if (!cluster_needs_new_alloc(bs, cluster_offset)) {
>           /* If a specific host_offset is required, check it */
>           bool offset_matches =
>               (cluster_offset & L2E_OFFSET_MASK) == *host_offset;
>   
>           if (offset_into_cluster(s, cluster_offset & L2E_OFFSET_MASK)) {
> -            qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset "
> +            qcow2_signal_corruption(bs, true, -1, -1, "%s cluster offset "
>                                       "%#llx unaligned (guest offset: %#" PRIx64
> -                                    ")", cluster_offset & L2E_OFFSET_MASK,
> +                                    ")", cluster_offset & QCOW_OFLAG_ZERO ?
> +                                    "Preallocated zero" : "Data",
> +                                    cluster_offset & L2E_OFFSET_MASK,
>                                       guest_offset);
>               ret = -EIO;
>               goto out;
> @@ -1273,15 +1338,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>           }
>   
>           /* We keep all QCOW_OFLAG_COPIED clusters */
> -        keep_clusters =
> -            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
> -                                      &l2_slice[l2_index],
> -                                      QCOW_OFLAG_COPIED | QCOW_OFLAG_ZERO);
> +        keep_clusters = count_single_write_clusters(bs, nb_clusters, l2_slice,
> +                                                    l2_index, false);
>           assert(keep_clusters <= nb_clusters);
>   
>           *bytes = MIN(*bytes,
>                    keep_clusters * s->cluster_size
>                    - offset_into_cluster(s, guest_offset));
> +        assert(*bytes != 0);
> +
> +        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
> +                          *bytes, l2_slice, m, true);
>   
>           ret = 1;
>       } else {
> @@ -1357,9 +1424,10 @@ static int do_alloc_cluster_offset(BlockDriverState *bs, uint64_t guest_offset,
>   }
>   
>   /*
> - * Allocates new clusters for an area that either is yet unallocated or needs a
> - * copy on write. If *host_offset is not INV_OFFSET, clusters are only
> - * allocated if the new allocation can match the specified host offset.
> + * Allocates new clusters for an area that is either still unallocated or
> + * cannot be overwritten in-place. If *host_offset is not INV_OFFSET,
> + * clusters are only allocated if the new allocation can match the specified
> + * host offset.
>    *
>    * Note that guest_offset may not be cluster aligned. In this case, the
>    * returned *host_offset points to exact byte referenced by guest_offset and
> @@ -1382,12 +1450,10 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>       BDRVQcow2State *s = bs->opaque;
>       int l2_index;
>       uint64_t *l2_slice;
> -    uint64_t entry;
>       uint64_t nb_clusters;
>       int ret;
> -    bool keep_old_clusters = false;
>   
> -    uint64_t alloc_cluster_offset = INV_OFFSET;
> +    uint64_t alloc_cluster_offset;
>   
>       trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
>                                *bytes);
> @@ -1402,10 +1468,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>   
>       l2_index = offset_to_l2_slice_index(s, guest_offset);
>       nb_clusters = MIN(nb_clusters, s->l2_slice_size - l2_index);
> -    assert(nb_clusters <= INT_MAX);
> -
> -    /* Limit total allocation byte count to INT_MAX */
> -    nb_clusters = MIN(nb_clusters, INT_MAX >> s->cluster_bits);
> +    /* Limit total allocation byte count to BDRV_REQUEST_MAX_BYTES */
> +    nb_clusters = MIN(nb_clusters, BDRV_REQUEST_MAX_BYTES >> s->cluster_bits);
>   
>       /* Find L2 entry for the first involved cluster */
>       ret = get_cluster_table(bs, guest_offset, &l2_slice, &l2_index);
> @@ -1413,67 +1477,32 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>           return ret;
>       }
>   
> -    entry = be64_to_cpu(l2_slice[l2_index]);
> -    nb_clusters = count_cow_clusters(bs, nb_clusters, l2_slice, l2_index);
> +    nb_clusters = count_single_write_clusters(bs, nb_clusters,
> +                                              l2_slice, l2_index, true);
>   
>       /* This function is only called when there were no non-COW clusters, so if
>        * we can't find any unallocated or COW clusters either, something is
>        * wrong with our code. */
>       assert(nb_clusters > 0);
>   
> -    if (qcow2_get_cluster_type(bs, entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
> -        (entry & QCOW_OFLAG_COPIED) &&
> -        (*host_offset == INV_OFFSET ||
> -         start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
> -    {
> -        int preallocated_nb_clusters;
> -
> -        if (offset_into_cluster(s, entry & L2E_OFFSET_MASK)) {
> -            qcow2_signal_corruption(bs, true, -1, -1, "Preallocated zero "
> -                                    "cluster offset %#llx unaligned (guest "
> -                                    "offset: %#" PRIx64 ")",
> -                                    entry & L2E_OFFSET_MASK, guest_offset);
> -            ret = -EIO;
> -            goto fail;
> -        }
> -
> -        /* Try to reuse preallocated zero clusters; contiguous normal clusters
> -         * would be fine, too, but count_cow_clusters() above has limited
> -         * nb_clusters already to a range of COW clusters */
> -        preallocated_nb_clusters =
> -            count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
> -                                      &l2_slice[l2_index], QCOW_OFLAG_COPIED);
> -        assert(preallocated_nb_clusters > 0);
> -
> -        nb_clusters = preallocated_nb_clusters;
> -        alloc_cluster_offset = entry & L2E_OFFSET_MASK;
> -
> -        /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
> -         * should not free them. */
> -        keep_old_clusters = true;
> +    /* Allocate at a given offset in the image file */
> +    alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
> +        start_of_cluster(s, *host_offset);
> +    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
> +                                  &nb_clusters);
> +    if (ret < 0) {
> +        goto out;
>       }
>   
> -    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);

actually we don't need l2_slice for keep_old=false in calculate_l2_meta, so if
calculate_l2_meta modified a bit, change of function tail is not needed..

Still, may be l2_slice will be used in calculate_l2_meta() in further patches? Will see..

> -
> -    if (alloc_cluster_offset == INV_OFFSET) {
> -        /* Allocate, if necessary at a given offset in the image file */
> -        alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
> -                               start_of_cluster(s, *host_offset);
> -        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
> -                                      &nb_clusters);
> -        if (ret < 0) {
> -            goto fail;
> -        }
> -
> -        /* Can't extend contiguous allocation */
> -        if (nb_clusters == 0) {
> -            *bytes = 0;
> -            return 0;
> -        }
> -
> -        assert(alloc_cluster_offset != INV_OFFSET);
> +    /* Can't extend contiguous allocation */
> +    if (nb_clusters == 0) {
> +        *bytes = 0;
> +        ret = 0;
> +        goto out;
>       }
>   
> +    assert(alloc_cluster_offset != INV_OFFSET);
> +
>       /*
>        * Save info needed for meta data update.
>        *
> @@ -1496,13 +1525,14 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>       *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>       assert(*bytes != 0);
>   
> -    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
> -                      m, keep_old_clusters);
> +    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
> +                      m, false);
>   
> -    return 1;
> +    ret = 1;
>   
> -fail:
> -    if (*m && (*m)->nb_clusters > 0) {
> +out:
> +    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
> +    if (ret < 0 && *m && (*m)->nb_clusters > 0) {
>           QLIST_REMOVE(*m, next_in_flight);
>       }


Hmm, unrelated to the patch, but why do we remove meta, which we didn't create?


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries
  2020-03-17 18:16 ` [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
@ 2020-04-09 12:22   ` Max Reitz
  2020-04-13 17:16     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Max Reitz @ 2020-04-09 12:22 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 5402 bytes --]

On 17.03.20 19:16, Alberto Garcia wrote:
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  tests/qemu-iotests/271     | 359 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/271.out | 244 +++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 604 insertions(+)
>  create mode 100755 tests/qemu-iotests/271
>  create mode 100644 tests/qemu-iotests/271.out
> 
> diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
> new file mode 100755
> index 0000000000..48f4d8d8ce
> --- /dev/null
> +++ b/tests/qemu-iotests/271

[...]

> +# Compare the bitmap of an extended L2 entry against an expected value
> +_verify_l2_bitmap()
> +{
> +    entry_no="$1"        # L2 entry number, starting from 0
> +    expected_alloc="$2"  # Space-separated list of allocated subcluster indexes
> +    expected_zero="$3"   # Space-separated list of zero subcluster indexes
> +
> +    offset=$(($l2_offset + $entry_no * 16))
> +    entry=`peek_file_be "$TEST_IMG" $offset 8`
> +    offset=$(($offset + 8))
> +    bitmap=`peek_file_be "$TEST_IMG" $offset 8`
> +
> +    expected_bitmap=0
> +    for bit in $expected_alloc; do
> +        expected_bitmap=$(($expected_bitmap | (1 << $bit)))
> +    done
> +    for bit in $expected_zero; do
> +        expected_bitmap=$(($expected_bitmap | (1 << (32 + $bit))))
> +    done
> +    expected_bitmap=`printf "%llu" $expected_bitmap`
> +
> +    printf "L2 entry #%d: 0x%016lx %016lx\n" "$entry_no" "$entry" "$bitmap"
> +    if [ "$bitmap" != "$expected_bitmap" ]; then
> +        printf "ERROR: expecting bitmap       0x%016lx\n" "$expected_bitmap"
> +    fi
> +}

Thanks! :)

[...]

> +# Test that writing to an image with subclusters produces the expected
> +# results, in images with and without backing files
> +for use_backing_file in yes no; do

[...]

> +    ### Write subcluster #31-#34 (cluster overlap) ###

#31-#34, I think.

> +    alloc="`seq 0 9` 16 31"; zero=""
> +    _test_write 'write -q -P 8 63k 4k' "$alloc" "$zero"
> +    alloc="0 1" ; zero=""
> +    _verify_l2_bitmap 1 "$alloc" "$zero"

[...]

> +    ### Partially zeroize an unallocated cluster (#3)
> +    if [ "$use_backing_file" = "yes" ]; then
> +        alloc="`seq 0 15`"; zero=""

Isn’t this a TODO?  (I.e., ideally we’d want the first 16 subclusters to
be zero, and the last 16 subclusters to be unallocated, right?)

(I’m asking because you did raise a TODO for the “Zero subcluster #1” test)

> +    else
> +        alloc=""; zero="`seq 0 31`"
> +    fi
> +    _test_write 'write -q -z 192k 32k' "$alloc" "$zero" 3
> +done

[...]

> +# Test that corrupted L2 entries are detected in both read and write
> +# operations
> +for corruption_test_cmd in read write; do

[...]

> +    echo
> +    echo "### Compressed cluster with subcluster bitmap != 0 - $corruption_test_cmd test ###"
> +    echo
> +    # We actually don't consider this a corrupted image.
> +    # The bitmap in compressed clusters is unused so QEMU should just ignore it.
> +    _make_test_img 1M
> +    $QEMU_IO -c 'write -q -P 11 -c 0 64k' "$TEST_IMG"
> +    poke_file "$TEST_IMG" $(($l2_offset+11)) "\x01\x01"
> +    alloc="24"; zero="0"
> +    _verify_l2_bitmap 0 "$alloc" "$zero"
> +    $QEMU_IO -c "$corruption_test_cmd -P 11 0 64k" "$TEST_IMG" | _filter_qemu_io

It might be interesting to see the bitmap after the write, i.e., that
it’s just been ignored.  Not necessary, though; the fact that the read
worked without error tells for sure that qemu ignores the bitmap.

> +done
> +
> +echo
> +echo "### Image creation options ###"
> +echo
> +echo "# cluster_size < 16k"
> +IMGOPTS="extended_l2=on,cluster_size=8k" _make_test_img 1M
> +
> +echo "# backing file and preallocation=metadata"
> +IMGOPTS="extended_l2=on,preallocation=metadata" _make_test_img -b "$TEST_IMG.backing" 1M

TODO?

> +
> +echo "# backing file and preallocation=falloc"
> +IMGOPTS="extended_l2=on,preallocation=falloc" _make_test_img -b "$TEST_IMG.backing" 1M
> +
> +echo "# backing file and preallocation=full"
> +IMGOPTS="extended_l2=on,preallocation=full" _make_test_img -b "$TEST_IMG.backing" 1M
> +
> +echo
> +echo "### qemu-img measure ###"
> +echo
> +echo "# 512MB, extended_l2=off" # This needs one L2 table
> +$QEMU_IMG measure --size 512M -O qcow2 -o extended_l2=off
> +echo "# 512MB, extended_l2=on"  # This needs two L2 tables
> +$QEMU_IMG measure --size 512M -O qcow2 -o extended_l2=on
> +
> +echo "# 16K clusters, 64GB, extended_l2=off" # This needs one L1 table

You mean one full L1 table cluster?

> +$QEMU_IMG measure --size 64G -O qcow2 -o cluster_size=16k,extended_l2=off
> +echo "# 16K clusters, 64GB, extended_l2=on"  # This needs two L2 tables

And two full L1 table clusters?

Max

> +$QEMU_IMG measure --size 64G -O qcow2 -o cluster_size=16k,extended_l2=on
> +
> +echo "# 8k clusters" # This should fail
> +$QEMU_IMG measure --size 1M -O qcow2 -o cluster_size=8k,extended_l2=on
> +
> +echo "# 1024 TB" # Maximum allowed size with extended_l2=on and 64K clusters
> +$QEMU_IMG measure --size 1024T -O qcow2 -o extended_l2=on
> +echo "# 1025 TB" # This should fail
> +$QEMU_IMG measure --size 1025T -O qcow2 -o extended_l2=on
> +
> +# success, all done
> +echo "*** done"
> +rm -f $seq.full
> +status=0
> +


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-04-09  7:57     ` Vladimir Sementsov-Ogievskiy
@ 2020-04-09 14:35       ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-09 14:35 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block

On Thu 09 Apr 2020 09:57:59 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
> What about squashing this:
>
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -615,32 +615,34 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
>           break;
>       case QCOW2_CLUSTER_ZERO_ALLOC:
>       case QCOW2_CLUSTER_NORMAL:
> +    {
> +        uint64_t host_cluster_offset = l2_slice & L2E_OFFSET_MASK;
> +        *host_offset = host_cluster_offset + offset_in_cluster;

Ok, that looks good (I'll put the brace on the 'case' line though).

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
  2020-04-09  7:50   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-09 14:45     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-09 14:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Thu 09 Apr 2020 09:50:52 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>       case QCOW2_CLUSTER_ZERO_PLAIN:
>>       case QCOW2_CLUSTER_UNALLOCATED:
>>           /* how many empty clusters ? */
>>           c = count_contiguous_clusters_unallocated(bs, nb_clusters,
>>                                                     &l2_slice[l2_index], type);
>> -        *cluster_offset = 0;
>> +        *host_offset = 0;
>
> Actually, dead assignment now.. But I feel that better to keep it.
>
> Hmm. May be, drop the first assignment of zero to host_offset? We
> actually don't need it, user should not rely on host_offset if we
> return an error.

Yeah, I'll drop the first one and keep this one.

>> @@ -3735,7 +3726,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>>           offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
>>           bytes = s->cluster_size;
>
> Unrelated to the patch, but.. Why we change bytes?? So, we can finish
> with success, but zero-out only first cluster?
>
> Ah, found, generic block-layer take care of it and never issue
> unaligned requests crossing cluster boundary.

That's right, hence the assert(head + bytes <= s->cluster_size); a few
lines before.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 28/30] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
  2020-03-17 18:16 ` [PATCH v4 28/30] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
@ 2020-04-09 14:49   ` Eric Blake
  0 siblings, 0 replies; 128+ messages in thread
From: Eric Blake @ 2020-04-09 14:49 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 3/17/20 1:16 PM, Alberto Garcia wrote:
> Now that the implementation of subclusters is complete we can finally
> add the necessary options to create and read images with this feature,
> which we call "extended L2 entries".
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---

> +++ b/qapi/block-core.json
> @@ -66,6 +66,9 @@
>   #                 standalone (read-only) raw image without looking at qcow2
>   #                 metadata (since: 4.0)
>   #
> +# @extended-l2: true if the image has extended L2 entries; only valid for
> +#               compat >= 1.1 (since 5.0)

Looks like we'll have to tweak this to 5.1 now (multiple spots).

> +++ b/block/qcow2.h
> @@ -231,13 +231,16 @@ enum {
>       QCOW2_INCOMPAT_DIRTY_BITNR      = 0,
>       QCOW2_INCOMPAT_CORRUPT_BITNR    = 1,
>       QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
> +    QCOW2_INCOMPAT_EXTL2_BITNR      = 4,

Why are we skipping bit 3?  (Hmm, now I have to go find the earlier 
patch that touched the spec...)

aha - the spec documented compression bits (the spec change for those 
made 5.0, but the qcow2 implementation did not); we'll have to rebase 
depending on what lands first.  But the resolution of that merge 
conflict will result in both feature bits eventually existing here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 03/30] qcow2: Add calculate_l2_meta()
  2020-04-09  8:30   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-09 15:12     ` Alberto Garcia
  2020-04-09 18:47       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-09 15:12 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Thu 09 Apr 2020 10:30:13 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> +static void calculate_l2_meta(BlockDriverState *bs,
>> +                              uint64_t host_cluster_offset,
>> +                              uint64_t guest_offset, unsigned bytes,
>> +                              QCowL2Meta **m, bool keep_old)
>> +{
>> +    BDRVQcow2State *s = bs->opaque;
>> +    unsigned cow_start_from = 0;
>> +    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>> +    unsigned cow_end_from = cow_start_to + bytes;
>> +    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>> +    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>> +    QCowL2Meta *old_m = *m;
>> +
>> +    *m = g_malloc0(sizeof(**m));
>> +    **m = (QCowL2Meta) {
>> +        .next           = old_m,
>> +
>> +        .alloc_offset   = host_cluster_offset,
>> +        .offset         = start_of_cluster(s, guest_offset),
>> +        .nb_clusters    = nb_clusters,
>> +
>> +        .keep_old_clusters = keep_old,
>> +
>> +        .cow_start = {
>> +            .offset     = cow_start_from,
>> +            .nb_bytes   = cow_start_to - cow_start_from,
>> +        },
>> +        .cow_end = {
>> +            .offset     = cow_end_from,
>
> Hmm. So, you make it equal to requested_bytes from handle_alloc().

No, requested_bytes from handle_alloc is:

   requested_bytes = *bytes + offset_into_cluster(s, guest_offset);

But *bytes is later modified before calling calculate_l2_meta():

   *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));

More details here:

   https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01808.html

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-03-17 18:16 ` [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature Alberto Garcia
  2020-04-08 11:09   ` Max Reitz
@ 2020-04-09 15:12   ` Eric Blake
  2020-04-10  9:29     ` Vladimir Sementsov-Ogievskiy
                       ` (2 more replies)
  1 sibling, 3 replies; 128+ messages in thread
From: Eric Blake @ 2020-04-09 15:12 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 3/17/20 1:16 PM, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>   docs/qcow2-cache.txt   | 19 +++++++++++-
>   2 files changed, 83 insertions(+), 4 deletions(-)
> 

> +== Extended L2 Entries ==
> +
> +An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
> +field of the header.
> +
> +In these images standard data clusters are divided into 32 subclusters of the
> +same size. They are contiguous and start from the beginning of the cluster.
> +Subclusters can be allocated independently and the L2 entry contains information
> +indicating the status of each one of them. Compressed data clusters don't have
> +subclusters so they are treated the same as in images without this feature.
> +
> +The size of an extended L2 entry is 128 bits so the number of entries per table
> +is calculated using this formula:
> +
> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
> +
> +The first 64 bits have the same format as the standard L2 table entry described
> +in the previous section, with the exception of bit 0 of the standard cluster
> +descriptor.
> +
> +The last 64 bits contain a subcluster allocation bitmap with this format:
> +
> +Subcluster Allocation Bitmap (for standard clusters):
> +
> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
> +
> +                    1: the subcluster is allocated. In this case the
> +                       host cluster offset field must contain a valid
> +                       offset.
> +                    0: the subcluster is not allocated. In this case
> +                       read requests shall go to the backing file or
> +                       return zeros if there is no backing file data.

Hmm - raw external files are incompatible with backing files.  Should we 
also document that extended L2 entries are incompatible with raw 
external files?  (The text here reminded me about it, but it would be 
the text earlier at the incompatible feature bits that we edit if we 
want that additional restriction; compare to the restriction in the 
autoclear bit 1).  After all, when raw external file is enabled, the 
entire image is allocated, at which point subclusters don't make much sense.

And in stating that, it looks like we have a pre-existing hole in that 
header bytes 8-15 don't mention the incompatibility with autoclear (when 
things are incompatible, it's best to mention the restriction from both 
sides, rather than only one of the sides, to make sure the reader 
notices the restriction regardless of which field they look up first). 
But tweaking that would be a separate patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 05/30] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
  2020-04-09 10:59   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-09 16:08     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-09 16:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Thu 09 Apr 2020 12:59:30 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>   static void calculate_l2_meta(BlockDriverState *bs,
>>                                 uint64_t host_cluster_offset,
>>                                 uint64_t guest_offset, unsigned bytes,
>> -                              QCowL2Meta **m, bool keep_old)
>> +                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>> -    unsigned cow_start_from = 0;
>> +    int l2_index = offset_to_l2_slice_index(s, guest_offset);
>> +    uint64_t l2_entry;
>> +    unsigned cow_start_from, cow_end_to;
>>       unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>>       unsigned cow_end_from = cow_start_to + bytes;
>> -    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>>       unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>>       QCowL2Meta *old_m = *m;
>> +    QCow2ClusterType type;
>> +
>> +    assert(nb_clusters <= s->l2_slice_size - l2_index);
>> +
>> +    /* Return if there's no COW (all clusters are normal and we keep them) */
>> +    if (keep_old) {
>> +        int i;
>> +        for (i = 0; i < nb_clusters; i++) {
>> +            l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
>> +            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
>
> Could we also allow full ZERO_ALLOC clusters here?

No, because the L2 entry needs to be modified (in order to remove the
'all zeroes' bit) and we need to create a QCowL2Meta entry for that (see
qcow2_handle_l2meta()).

>> +    /* Get the L2 entry of the first cluster */
>> +    l2_entry = be64_to_cpu(l2_slice[l2_index]);
>> +    type = qcow2_get_cluster_type(bs, l2_entry);
>> +
>> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
>> +        cow_start_from = cow_start_to;
>> +    } else {
>> +        cow_start_from = 0;
>> +    }
>> +
>> +    /* Get the L2 entry of the last cluster */
>> +    l2_entry = be64_to_cpu(l2_slice[l2_index + nb_clusters - 1]);
>> +    type = qcow2_get_cluster_type(bs, l2_entry);
>> +
>> +    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
>> +        cow_end_to = cow_end_from;
>> +    } else {
>> +        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>> +    }
>
> These two ifs may be moved into if (keep_old), and drop "&& keep_old"
> from conditions. This also will allow to drop extra calculations, move
> new variables to if (keep_old) {} block and allow to pass
> l2_slice=NULL together with keep_old=false.

In subsequent patches we're going to have more cases than just
QCOW2_CLUSTER_NORMAL so I don't think it makes sense to move the
keep_old check around.

>> @@ -1239,7 +1304,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>>   
>>       l2_index = offset_to_l2_slice_index(s, guest_offset);
>>       nb_clusters = MIN(nb_clusters, s->l2_slice_size - l2_index);
>> -    assert(nb_clusters <= INT_MAX);
>> +    /* Limit total byte count to BDRV_REQUEST_MAX_BYTES */
>> +    nb_clusters = MIN(nb_clusters, BDRV_REQUEST_MAX_BYTES >> s->cluster_bits);
>>   
>>       /* Find L2 entry for the first involved cluster */
>>       ret = get_cluster_table(bs, guest_offset, &l2_slice, &l2_index);
>> @@ -1249,18 +1315,17 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>>   
>>       cluster_offset = be64_to_cpu(l2_slice[l2_index]);
>
> It would be good to s/cluster_offset/l2_entry/
>
> And, "cluster_offset & L2E_OFFSET_MASK" is used so many times, so, I'd
> not substitute, but keep both variables: l2_entry and cluster_offset.

Sounds good, I can change that.

>> +    /* Allocate at a given offset in the image file */
>> +    alloc_cluster_offset = *host_offset == INV_OFFSET ? INV_OFFSET :
>> +        start_of_cluster(s, *host_offset);
>> +    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
>> +                                  &nb_clusters);
>> +    if (ret < 0) {
>> +        goto out;
>>       }
>>   
>> -    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
>
> actually we don't need l2_slice for keep_old=false in
> calculate_l2_meta, so if calculate_l2_meta modified a bit, change of
> function tail is not needed..
>
> Still, may be l2_slice will be used in calculate_l2_meta() in further
> patches? Will see..

We'll need it in a later patch.

>> -fail:
>> -    if (*m && (*m)->nb_clusters > 0) {
>> +out:
>> +    qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
>> +    if (ret < 0 && *m && (*m)->nb_clusters > 0) {
>>           QLIST_REMOVE(*m, next_in_flight);
>>       }
>
> Hmm, unrelated to the patch, but why do we remove meta, which we
> didn't create?

Not sure actually, I would need to check further...

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 03/30] qcow2: Add calculate_l2_meta()
  2020-04-09 15:12     ` Alberto Garcia
@ 2020-04-09 18:47       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-09 18:47 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

09.04.2020 18:12, Alberto Garcia wrote:
> On Thu 09 Apr 2020 10:30:13 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>> +static void calculate_l2_meta(BlockDriverState *bs,
>>> +                              uint64_t host_cluster_offset,
>>> +                              uint64_t guest_offset, unsigned bytes,
>>> +                              QCowL2Meta **m, bool keep_old)
>>> +{
>>> +    BDRVQcow2State *s = bs->opaque;
>>> +    unsigned cow_start_from = 0;
>>> +    unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>>> +    unsigned cow_end_from = cow_start_to + bytes;
>>> +    unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>>> +    unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>>> +    QCowL2Meta *old_m = *m;
>>> +
>>> +    *m = g_malloc0(sizeof(**m));
>>> +    **m = (QCowL2Meta) {
>>> +        .next           = old_m,
>>> +
>>> +        .alloc_offset   = host_cluster_offset,
>>> +        .offset         = start_of_cluster(s, guest_offset),
>>> +        .nb_clusters    = nb_clusters,
>>> +
>>> +        .keep_old_clusters = keep_old,
>>> +
>>> +        .cow_start = {
>>> +            .offset     = cow_start_from,
>>> +            .nb_bytes   = cow_start_to - cow_start_from,
>>> +        },
>>> +        .cow_end = {
>>> +            .offset     = cow_end_from,
>>
>> Hmm. So, you make it equal to requested_bytes from handle_alloc().
> 
> No, requested_bytes from handle_alloc is:
> 
>     requested_bytes = *bytes + offset_into_cluster(s, guest_offset);
> 
> But *bytes is later modified before calling calculate_l2_meta():
> 
>     *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
> 
> More details here:
> 
>     https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01808.html
> 

Ahah, me again, sorry :)



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 06/30] qcow2: Add get_l2_entry() and set_l2_entry()
  2020-03-17 18:16 ` [PATCH v4 06/30] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
@ 2020-04-10  8:48   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-10  8:48 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> The size of an L2 entry is 64 bits, but if we want to have subclusters
> we need extended L2 entries. This means that we have to access L2
> tables and slices differently depending on whether an image has
> extended L2 entries or not.
> 
> This patch replaces all l2_slice[] accesses with calls to
> get_l2_entry() and set_l2_entry().

and it replaces some l2_table[] as well.

I found one not-updated case, in qcow2-refcount.c:

        ret = bdrv_pwrite_sync(bs->file, l2e_offset,
                               &l2_table[i], sizeof(uint64_t));

But on the other hand, if l2_table will be enhanced somehow, this should
be updated other way, as we don't get l2_entry, but write it...

Also, I don't quite like the naming: you'll update in further patch the interface

to be [gs]et_l2_entry and [gs]et_l2_bitmap..

But get_l2_entry, don't return the whole entry, only one half of it, same for set_l2_entry...

May be, good to make a comment above [gs]et_l2_entry definitions.

anyway,
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 08/30] qcow2: Add dummy has_subclusters() function
  2020-03-17 18:16 ` [PATCH v4 08/30] qcow2: Add dummy has_subclusters() function Alberto Garcia
@ 2020-04-10  9:11   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-10  9:11 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> This function will be used by the qcow2 code to check if an image has
> subclusters or not.
> 
> At the moment this simply returns false. Once all patches needed for
> subcluster support are ready then QEMU will be able to create and
> read images with subclusters and this function will return the actual
> value.
> 
> Signed-off-by: Alberto Garcia<berto@igalia.com>
> Reviewed-by: Eric Blake<eblake@redhat.com>
> Reviewed-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-09 15:12   ` Eric Blake
@ 2020-04-10  9:29     ` Vladimir Sementsov-Ogievskiy
  2020-04-14 14:50       ` Alberto Garcia
  2020-04-15 19:11       ` Alberto Garcia
  2020-04-10 12:01     ` Alberto Garcia
  2020-04-14 18:16     ` Alberto Garcia
  2 siblings, 2 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-10  9:29 UTC (permalink / raw)
  To: Eric Blake, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

09.04.2020 18:12, Eric Blake wrote:
> On 3/17/20 1:16 PM, Alberto Garcia wrote:
>> Subcluster allocation in qcow2 is implemented by extending the
>> existing L2 table entries and adding additional information to
>> indicate the allocation status of each subcluster.
>>
>> This patch documents the changes to the qcow2 format and how they
>> affect the calculation of the L2 cache size.
>>
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>>   docs/qcow2-cache.txt   | 19 +++++++++++-
>>   2 files changed, 83 insertions(+), 4 deletions(-)
>>
> 
>> +== Extended L2 Entries ==
>> +
>> +An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
>> +field of the header.
>> +
>> +In these images standard data clusters are divided into 32 subclusters of the
>> +same size. They are contiguous and start from the beginning of the cluster.
>> +Subclusters can be allocated independently and the L2 entry contains information
>> +indicating the status of each one of them. Compressed data clusters don't have
>> +subclusters so they are treated the same as in images without this feature.
>> +
>> +The size of an extended L2 entry is 128 bits so the number of entries per table
>> +is calculated using this formula:
>> +
>> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
>> +
>> +The first 64 bits have the same format as the standard L2 table entry described
>> +in the previous section, with the exception of bit 0 of the standard cluster
>> +descriptor.
>> +
>> +The last 64 bits contain a subcluster allocation bitmap with this format:
>> +
>> +Subcluster Allocation Bitmap (for standard clusters):
>> +
>> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
>> +
>> +                    1: the subcluster is allocated. In this case the
>> +                       host cluster offset field must contain a valid
>> +                       offset.
>> +                    0: the subcluster is not allocated. In this case
>> +                       read requests shall go to the backing file or
>> +                       return zeros if there is no backing file data.
> 
> Hmm - raw external files are incompatible with backing files.  Should we also document that extended L2 entries are incompatible with raw external files?  (The text here reminded me about it, but it would be the text earlier at the incompatible feature bits that we edit if we want that additional restriction; compare to the restriction in the autoclear bit 1).  After all, when raw external file is enabled, the entire image is allocated, at which point subclusters don't make much sense.

It still may cache information about zeroed subclusters: gives more detailed block-status. But we should mention somehow external files. Hm. not only for raw external files, but it is documented that cluster can't be unallocated when an external data file is used.

> 
> And in stating that, it looks like we have a pre-existing hole in that header bytes 8-15 don't mention the incompatibility with autoclear (when things are incompatible, it's best to mention the restriction from both sides, rather than only one of the sides, to make sure the reader notices the restriction regardless of which field they look up first). But tweaking that would be a separate patch.
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State
  2020-03-17 18:16 ` [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
  2020-04-08 11:12   ` Max Reitz
@ 2020-04-10  9:45   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-10  9:45 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> This patch adds the following new fields to BDRVQcow2State:
> 
> - subclusters_per_cluster: Number of subclusters in a cluster
> - subcluster_size: The size of each subcluster, in bytes
> - subcluster_bits: No. of bits so 1 << subcluster_bits = subcluster_size
> 
> Images without subclusters are treated as if they had exactly one,

exactly one subcluster per cluster...

> with subcluster_size = cluster_size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-09 15:12   ` Eric Blake
  2020-04-10  9:29     ` Vladimir Sementsov-Ogievskiy
@ 2020-04-10 12:01     ` Alberto Garcia
  2020-04-14 18:16     ` Alberto Garcia
  2 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-10 12:01 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 09 Apr 2020 05:12:16 PM CEST, Eric Blake wrote:
> Hmm - raw external files are incompatible with backing files.  Should
> we also document that extended L2 entries are incompatible with raw
> external files?

Ok, I can also add additional checks to forbid creating such images.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-04-09 10:05   ` Max Reitz
@ 2020-04-10 12:47     ` Alberto Garcia
  2020-04-14 10:13       ` Max Reitz
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-10 12:47 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 09 Apr 2020 12:05:12 PM CEST, Max Reitz wrote:
>>          switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>          case QCOW2_CLUSTER_UNALLOCATED:
>> -            if (full_discard || !bs->backing) {
>> +            if (full_discard) {
>> +                /* If the image has extended L2 entries we can only
>> +                 * skip this operation if the L2 bitmap is zero. */
>> +                uint64_t bitmap = has_subclusters(s) ?
>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>
> Isn’t this bitmap only valid for standard clusters?  In this case, the
> whole cluster is unallocated, so the bitmap shouldn’t be relevant,
> AFAIU.

I'm not sure if I follow you.

An unallocated cluster can still have QCOW_OFLAG_SUB_ZERO set in some of
its subclusters. Those read as zeroes and the rest go to the backing
file.

After a full discard all subclusters should be completely deallocated so
those bits should be cleared.

If the bitmap is already 0 (the whole cluster is already unallocated) or
if the image does not have extended L2 entries (which also means that
the whole cluster is already unallocated) then we can skip the discard.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 27/30] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters
  2020-04-09 10:27   ` Max Reitz
@ 2020-04-10 16:42     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-10 16:42 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 09 Apr 2020 12:27:36 PM CEST, Max Reitz wrote:
>> +=== Testing version downgrade with extended L2 entries ===
>> +
>> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
>> +qemu-img: Cannot downgrade an image with incompatible features 0x10 set
>
> This test fails in this commit, because extended_l2 is only available
> after the next commit.  The code changes and the test itself look good
> to me, though.

You're right, thanks! Since this one only adds an assertion I'll just
swap both commits.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 10/30] qcow2: Add offset_to_sc_index()
  2020-03-17 18:16 ` [PATCH v4 10/30] qcow2: Add offset_to_sc_index() Alberto Garcia
@ 2020-04-13 11:02   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-13 11:02 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> For a given offset, return the subcluster number within its cluster
> (i.e. with 32 subclusters per cluster it returns a number between 0
> and 31).
> 
> Signed-off-by: Alberto Garcia<berto@igalia.com>
> Reviewed-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries
  2020-04-09 12:22   ` Max Reitz
@ 2020-04-13 17:16     ` Alberto Garcia
  2020-04-14 10:14       ` Max Reitz
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-13 17:16 UTC (permalink / raw)
  To: Max Reitz, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 09 Apr 2020 02:22:37 PM CEST, Max Reitz wrote:
>> +    ### Write subcluster #31-#34 (cluster overlap) ###
>
> #31-#34, I think.

That's what I wrote :-?

>> +    ### Partially zeroize an unallocated cluster (#3)
>> +    if [ "$use_backing_file" = "yes" ]; then
>> +        alloc="`seq 0 15`"; zero=""
>
> Isn’t this a TODO?  (I.e., ideally we’d want the first 16 subclusters
> to be zero, and the last 16 subclusters to be unallocated, right?)
>
> (I’m asking because you did raise a TODO for the “Zero subcluster #1”
> test)

Maybe, but I just implemented zeroize at the subcluster level :-) Wait
for the next version of the series.

>> +    echo
>> +    echo "### Compressed cluster with subcluster bitmap != 0 - $corruption_test_cmd test ###"
>> +    echo
>> +    # We actually don't consider this a corrupted image.
>> +    # The bitmap in compressed clusters is unused so QEMU should just ignore it.
>> +    _make_test_img 1M
>> +    $QEMU_IO -c 'write -q -P 11 -c 0 64k' "$TEST_IMG"
>> +    poke_file "$TEST_IMG" $(($l2_offset+11)) "\x01\x01"
>> +    alloc="24"; zero="0"
>> +    _verify_l2_bitmap 0 "$alloc" "$zero"
>> +    $QEMU_IO -c "$corruption_test_cmd -P 11 0 64k" "$TEST_IMG" | _filter_qemu_io
>
> It might be interesting to see the bitmap after the write, i.e., that
> it’s just been ignored.

Yeah, why not.

>> +echo "# 16K clusters, 64GB, extended_l2=off" # This needs one L1 table
>
> You mean one full L1 table cluster?
>
>> +$QEMU_IMG measure --size 64G -O qcow2 -o cluster_size=16k,extended_l2=off
>> +echo "# 16K clusters, 64GB, extended_l2=on"  # This needs two L2 tables
>
> And two full L1 table clusters?

You're right, I'll correct that.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-03-17 18:16 ` [PATCH v4 11/30] qcow2: Add l2_entry_size() Alberto Garcia
@ 2020-04-14  9:44   ` Vladimir Sementsov-Ogievskiy
  2020-04-14 12:20     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14  9:44 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> qcow2 images with subclusters have 128-bit L2 entries. The first 64
> bits contain the same information as traditional images and the last
> 64 bits form a bitmap with the status of each individual subcluster.
> 
> Because of that we cannot assume that L2 entries are sizeof(uint64_t)
> anymore. This function returns the proper value for the image.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2.h          |  9 +++++++++
>   block/qcow2-cluster.c  | 12 ++++++------
>   block/qcow2-refcount.c | 14 ++++++++------
>   block/qcow2.c          |  8 ++++----
>   4 files changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 06929072d2..1eb4b46807 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -80,6 +80,10 @@
>   
>   #define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
>   
> +/* Size of normal and extended L2 entries */
> +#define L2E_SIZE_NORMAL   (sizeof(uint64_t))
> +#define L2E_SIZE_EXTENDED (sizeof(uint64_t) * 2)
> +
>   #define MIN_CLUSTER_BITS 9
>   #define MAX_CLUSTER_BITS 21
>   
> @@ -506,6 +510,11 @@ static inline bool has_subclusters(BDRVQcow2State *s)
>       return false;
>   }
>   
> +static inline size_t l2_entry_size(BDRVQcow2State *s)
> +{
> +    return has_subclusters(s) ? L2E_SIZE_EXTENDED : L2E_SIZE_NORMAL;
> +}
> +
>   static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
>                                       int idx)
>   {
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index cd48ab0223..41a23c5305 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -208,7 +208,7 @@ static int l2_load(BlockDriverState *bs, uint64_t offset,
>                      uint64_t l2_offset, uint64_t **l2_slice)
>   {
>       BDRVQcow2State *s = bs->opaque;
> -    int start_of_slice = sizeof(uint64_t) *
> +    int start_of_slice = l2_entry_size(s) *
>           (offset_to_l2_index(s, offset) - offset_to_l2_slice_index(s, offset));
>   
>       return qcow2_cache_get(bs, s->l2_table_cache, l2_offset + start_of_slice,
> @@ -281,7 +281,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)
>   
>       /* allocate a new l2 entry */
>   
> -    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
> +    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));

hmm. s->l2_size * l2_entry_size, isn't it just s->cluster_size always? Maybe, just refactor these things?


>       if (l2_offset < 0) {
>           ret = l2_offset;
>           goto fail;
> @@ -305,7 +305,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index)

[...]

> @@ -1425,7 +1425,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
>           bs->encrypted = true;
>       }
>   
> -    s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
> +    s->l2_bits = s->cluster_bits - ctz32(l2_entry_size(s));
>       s->l2_size = 1 << s->l2_bits;
>       /* 2^(s->refcount_order - 3) is the refcount width in bytes */
>       s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
> @@ -4104,7 +4104,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>            *  preallocation. All that matters is that we will not have to allocate
>            *  new refcount structures for them.) */
>           nb_new_l2_tables = DIV_ROUND_UP(nb_new_data_clusters,
> -                                        s->cluster_size / sizeof(uint64_t));
> +                                        s->cluster_size / l2_entry_size(s));

Isn't it just s->l2_size ?

>           /* The cluster range may not be aligned to L2 boundaries, so add one L2
>            * table for a potential head/tail */
>           nb_new_l2_tables++;
> 


Conversions looks correct, but how to check that we have converted everything?

Trying at least

    cd block; git grep 'sizeof(uint64_t)' qcow2* | grep -v 'l1_size \*' | grep -v 'l1_sz \*' | grep -v refcount | grep -v reftable

I found this not converted chunk:

     /* total size of L2 tables */
     nl2e = aligned_total_size / cluster_size;
     nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
     meta_size += nl2e * sizeof(uint64_t);


Hmm. How to avoid it? Maybe, at least, refactor the code, to drop all sizeof(uint64_t), converting them to L2_ENTRY_SIZE, L1_ENTRY_SIZE, REFTABLE_ENTRY_SIZE etc?
And all occurrences of pure '8' (not many of them exist)

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 12/30] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
  2020-03-17 18:16 ` [PATCH v4 12/30] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
@ 2020-04-14  9:49   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14  9:49 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> Extended L2 entries are 128-bit wide: 64 bits for the entry itself and
> 64 bits for the subcluster allocation bitmap.
> 
> In order to support them correctly get/set_l2_entry() need to be
> updated so they take the entry width into account in order to
> calculate the correct offset.
> 
> This patch also adds the get/set_l2_bitmap() functions that are
> used to access the bitmaps. For convenience we allow calling
> get_l2_bitmap() on images without subclusters, although the caller
> does not need and should ignore the returned value.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-04-10 12:47     ` Alberto Garcia
@ 2020-04-14 10:13       ` Max Reitz
  0 siblings, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-14 10:13 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1647 bytes --]

On 10.04.20 14:47, Alberto Garcia wrote:
> On Thu 09 Apr 2020 12:05:12 PM CEST, Max Reitz wrote:
>>>          switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>>          case QCOW2_CLUSTER_UNALLOCATED:
>>> -            if (full_discard || !bs->backing) {
>>> +            if (full_discard) {
>>> +                /* If the image has extended L2 entries we can only
>>> +                 * skip this operation if the L2 bitmap is zero. */
>>> +                uint64_t bitmap = has_subclusters(s) ?
>>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>>
>> Isn’t this bitmap only valid for standard clusters?  In this case, the
>> whole cluster is unallocated, so the bitmap shouldn’t be relevant,
>> AFAIU.
> 
> I'm not sure if I follow you.
> 
> An unallocated cluster can still have QCOW_OFLAG_SUB_ZERO set in some of
> its subclusters. Those read as zeroes and the rest go to the backing
> file.

Hm, right, this is the only way to have non-preallocated zero clusters
after all.

I suppose I read the spec wrong and assumed somehow that unallocated
clusters don’t use “standard cluster descriptors”, so their bitmap usage
would be undefined.  Don’t know how that happened.

> After a full discard all subclusters should be completely deallocated so
> those bits should be cleared.
> 
> If the bitmap is already 0 (the whole cluster is already unallocated) or
> if the image does not have extended L2 entries (which also means that
> the whole cluster is already unallocated) then we can skip the discard.

Yep, seems right.

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries
  2020-04-13 17:16     ` Alberto Garcia
@ 2020-04-14 10:14       ` Max Reitz
  0 siblings, 0 replies; 128+ messages in thread
From: Max Reitz @ 2020-04-14 10:14 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev


[-- Attachment #1.1: Type: text/plain, Size: 773 bytes --]

On 13.04.20 19:16, Alberto Garcia wrote:
> On Thu 09 Apr 2020 02:22:37 PM CEST, Max Reitz wrote:
>>> +    ### Write subcluster #31-#34 (cluster overlap) ###
>>
>> #31-#34, I think.
> 
> That's what I wrote :-?

Errrrr #31-#33.

>>> +    ### Partially zeroize an unallocated cluster (#3)
>>> +    if [ "$use_backing_file" = "yes" ]; then
>>> +        alloc="`seq 0 15`"; zero=""
>>
>> Isn’t this a TODO?  (I.e., ideally we’d want the first 16 subclusters
>> to be zero, and the last 16 subclusters to be unallocated, right?)
>>
>> (I’m asking because you did raise a TODO for the “Zero subcluster #1”
>> test)
> 
> Maybe, but I just implemented zeroize at the subcluster level :-) Wait
> for the next version of the series.

OK :)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
  2020-03-17 18:16 ` [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
  2020-04-08 11:23   ` Max Reitz
@ 2020-04-14 11:10   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14 11:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> This patch adds QCow2SubclusterType, which is the subcluster-level
> version of QCow2ClusterType. All QCOW2_SUBCLUSTER_* values have the
> the same meaning as their QCOW2_CLUSTER_* equivalents (when they
> exist). See below for details and caveats.
> 
> In images without extended L2 entries clusters are treated as having
> exactly one subcluster so it is possible to replace one data type with
> the other while keeping the exact same semantics.
> 
> With extended L2 entries there are new possible values, and every
> subcluster in the same cluster can obviously have a different
> QCow2SubclusterType so functions need to be adapted to work on the
> subcluster level.
> 
> There are several things that have to be taken into account:
> 
>    a) QCOW2_SUBCLUSTER_COMPRESSED means that the whole cluster is
>       compressed. We do not support compression at the subcluster
>       level.
> 
>    b) There are two different values for unallocated subclusters:
>       QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN which means that the whole
>       cluster is unallocated, and QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
>       which means that the cluster is allocated but the subcluster is
>       not. The latter can only happen in images with extended L2
>       entries.
> 
>    c) QCOW2_SUBCLUSTER_INVALID is used to detect the cases where an L2
>       entry has a value that violates the specification. The caller is
>       responsible for handling these situations.
> 
>       To prevent compatibility problems with images that have invalid
>       values but are currently being read by QEMU without causing side
>       effects, QCOW2_SUBCLUSTER_INVALID is only returned for images
>       with extended L2 entries.
> 
> qcow2_cluster_to_subcluster_type() is added as a separate function
> from qcow2_get_subcluster_type(), but this is only temporary and both
> will be merged in a subsequent patch.
> 
> Signed-off-by: Alberto Garcia<berto@igalia.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-04-14  9:44   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-14 12:20     ` Alberto Garcia
  2020-04-14 12:29       ` Vladimir Sementsov-Ogievskiy
  2020-04-14 16:01       ` Eric Blake
  0 siblings, 2 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 12:20 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Tue 14 Apr 2020 11:44:57 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>       /* allocate a new l2 entry */
>>   
>> -    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
>> +    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
>
> hmm. s->l2_size * l2_entry_size, isn't it just s->cluster_size always?
> Maybe, just refactor these things?

I think the patch is simpler to follow if I only do the strictly
necessary changes and don't mix them with other things.

>>           nb_new_l2_tables = DIV_ROUND_UP(nb_new_data_clusters,
>> -                                        s->cluster_size / sizeof(uint64_t));
>> +                                        s->cluster_size / l2_entry_size(s));
>
> Isn't it just s->l2_size ?

Yes, same as before.

>>           /* The cluster range may not be aligned to L2 boundaries, so add one L2
>>            * table for a potential head/tail */
>>           nb_new_l2_tables++;
>
> Conversions looks correct, but how to check that we have converted
> everything?

I went through all cases, I think I didn't miss any!

> I found this not converted chunk:
>
>      /* total size of L2 tables */
>      nl2e = aligned_total_size / cluster_size;
>      nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
>      meta_size += nl2e * sizeof(uint64_t);

This is used by qcow2_measure() and is fixed on a later patch because,
unlike all other cases, it does not use a BlockDriverState to determine
the size of an L2 entry.

> Hmm. How to avoid it? Maybe, at least, refactor the code, to drop all
> sizeof(uint64_t), converting them to L2_ENTRY_SIZE, L1_ENTRY_SIZE,
> REFTABLE_ENTRY_SIZE etc?

That wouldn't be a bad thing I guess but, again, for a separate patch or
series.

> And all occurrences of pure '8' (not many of them exist)

I think most/all nowadays only refer to the number of bits per byte.

Maybe there's a couple that still need to be fixed, but we have been
removing a lot of numeric literals from the qcow2 code (see for example
b6c246942b, 3afea40243 or a35f87f50d).

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-04-14 12:20     ` Alberto Garcia
@ 2020-04-14 12:29       ` Vladimir Sementsov-Ogievskiy
  2020-04-14 12:33         ` Alberto Garcia
  2020-04-14 16:01       ` Eric Blake
  1 sibling, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14 12:29 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

14.04.2020 15:20, Alberto Garcia wrote:
> On Tue 14 Apr 2020 11:44:57 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>>        /* allocate a new l2 entry */
>>>    
>>> -    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * sizeof(uint64_t));
>>> +    l2_offset = qcow2_alloc_clusters(bs, s->l2_size * l2_entry_size(s));
>>
>> hmm. s->l2_size * l2_entry_size, isn't it just s->cluster_size always?
>> Maybe, just refactor these things?
> 
> I think the patch is simpler to follow if I only do the strictly
> necessary changes and don't mix them with other things.
> 
>>>            nb_new_l2_tables = DIV_ROUND_UP(nb_new_data_clusters,
>>> -                                        s->cluster_size / sizeof(uint64_t));
>>> +                                        s->cluster_size / l2_entry_size(s));
>>
>> Isn't it just s->l2_size ?
> 
> Yes, same as before.
> 
>>>            /* The cluster range may not be aligned to L2 boundaries, so add one L2
>>>             * table for a potential head/tail */
>>>            nb_new_l2_tables++;
>>
>> Conversions looks correct, but how to check that we have converted
>> everything?
> 
> I went through all cases, I think I didn't miss any!
> 
>> I found this not converted chunk:
>>
>>       /* total size of L2 tables */
>>       nl2e = aligned_total_size / cluster_size;
>>       nl2e = ROUND_UP(nl2e, cluster_size / sizeof(uint64_t));
>>       meta_size += nl2e * sizeof(uint64_t);
> 
> This is used by qcow2_measure() and is fixed on a later patch because,
> unlike all other cases, it does not use a BlockDriverState to determine
> the size of an L2 entry.
> 
>> Hmm. How to avoid it? Maybe, at least, refactor the code, to drop all
>> sizeof(uint64_t), converting them to L2_ENTRY_SIZE, L1_ENTRY_SIZE,
>> REFTABLE_ENTRY_SIZE etc?
> 
> That wouldn't be a bad thing I guess but, again, for a separate patch or
> series.
> 
>> And all occurrences of pure '8' (not many of them exist)
> 
> I think most/all nowadays only refer to the number of bits per byte.
> 
> Maybe there's a couple that still need to be fixed, but we have been
> removing a lot of numeric literals from the qcow2 code (see for example
> b6c246942b, 3afea40243 or a35f87f50d).
> 


git grep '\<8\>' block/qcow2*

shows at least

qcow2-cluster.c:            s->l1_table_offset + 8 * l1_start_index, bufsize, false);
qcow2-cluster.c:                           s->l1_table_offset + 8 * l1_start_index,


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset()
  2020-03-17 18:16 ` [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset() Alberto Garcia
  2020-04-08 12:15   ` Max Reitz
@ 2020-04-14 12:30   ` Vladimir Sementsov-Ogievskiy
  2020-04-14 12:38     ` Alberto Garcia
  1 sibling, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14 12:30 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> This function returns an integer that can be either an error code or a
> cluster type (a value from the QCow2ClusterType enum).
> 
> We are going to start using subcluster types instead of cluster types
> in some functions so it's better to use the exact data types instead
> of integers for clarity and in order to detect errors more easily.
> 
> This patch makes qcow2_get_host_offset() return 0 on success and
> puts the returned cluster type in a separate parameter. There are no
> semantic changes.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2.h         |  3 ++-
>   block/qcow2-cluster.c | 11 +++++++----
>   block/qcow2.c         | 37 ++++++++++++++++++++++---------------
>   3 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 52865787ee..6b7b286b91 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h


[..]

> @@ -3716,6 +3719,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>       if (head || tail) {
>           uint64_t off;
>           unsigned int nr;
> +        QCow2ClusterType type;
>   
>           assert(head + bytes <= s->cluster_size);
>   
> @@ -3731,10 +3735,11 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>           offset = QEMU_ALIGN_DOWN(offset, s->cluster_size);
>           bytes = s->cluster_size;
>           nr = s->cluster_size;
> -        ret = qcow2_get_host_offset(bs, offset, &nr, &off);
> -        if (ret != QCOW2_CLUSTER_UNALLOCATED &&
> -            ret != QCOW2_CLUSTER_ZERO_PLAIN &&
> -            ret != QCOW2_CLUSTER_ZERO_ALLOC) {
> +        ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);

pre-patch, but probably better to return original errno on qcow2_get_host_offset failure, instead of masking it.

> +        if (ret < 0 ||
> +            (type != QCOW2_CLUSTER_UNALLOCATED &&
> +             type != QCOW2_CLUSTER_ZERO_PLAIN &&
> +             type != QCOW2_CLUSTER_ZERO_ALLOC)) {
>               qemu_co_mutex_unlock(&s->lock);
>               return -ENOTSUP;
>           }
> @@ -3792,16 +3797,18 @@ qcow2_co_copy_range_from(BlockDriverState *bs,
>   
>       while (bytes != 0) {
>           uint64_t copy_offset = 0;
> +        QCow2ClusterType type;
>           /* prepare next request */
>           cur_bytes = MIN(bytes, INT_MAX);
>           cur_write_flags = write_flags;
>   
> -        ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes, &copy_offset);
> +        ret = qcow2_get_host_offset(bs, src_offset, &cur_bytes,
> +                                    &copy_offset, &type);
>           if (ret < 0) {
>               goto out;
>           }
>   
> -        switch (ret) {
> +        switch (type) {
>           case QCOW2_CLUSTER_UNALLOCATED:
>               if (bs->backing && bs->backing->bs) {
>                   int64_t backing_length = bdrv_getlength(bs->backing->bs);
> 
Hmm, just noted that in case of bdrv_co_copy_range_from failure below, we do mutex lock/unlock for nothing.

I think, we want mutex lock/unlock just around qcow2_co_preadv_part() call, like in  qcow2_co_preadv_part above().

I can send a refactoring patch..

Anyway, patch itself is OK:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-04-14 12:29       ` Vladimir Sementsov-Ogievskiy
@ 2020-04-14 12:33         ` Alberto Garcia
  2020-04-14 12:39           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 12:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Tue 14 Apr 2020 02:29:13 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>> Hmm. How to avoid it? Maybe, at least, refactor the code, to drop all
>>> sizeof(uint64_t), converting them to L2_ENTRY_SIZE, L1_ENTRY_SIZE,
>>> REFTABLE_ENTRY_SIZE etc?
>> 
>> That wouldn't be a bad thing I guess but, again, for a separate patch or
>> series.
>> 
>>> And all occurrences of pure '8' (not many of them exist)
>> 
>> I think most/all nowadays only refer to the number of bits per byte.
>> 
>> Maybe there's a couple that still need to be fixed, but we have been
>> removing a lot of numeric literals from the qcow2 code (see for example
>> b6c246942b, 3afea40243 or a35f87f50d).
>> 
>
>
> git grep '\<8\>' block/qcow2*
>
> shows at least
>
> qcow2-cluster.c:            s->l1_table_offset + 8 * l1_start_index, bufsize, false);
> qcow2-cluster.c:                           s->l1_table_offset + 8 * l1_start_index,

I see, worth replacing with L1_ENTRY_SIZE as you suggest. I can take of
writing the patches if you want.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset()
  2020-04-14 12:30   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-14 12:38     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 12:38 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Tue 14 Apr 2020 02:30:30 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> -        ret = qcow2_get_host_offset(bs, offset, &nr, &off);
>> -        if (ret != QCOW2_CLUSTER_UNALLOCATED &&
>> -            ret != QCOW2_CLUSTER_ZERO_PLAIN &&
>> -            ret != QCOW2_CLUSTER_ZERO_ALLOC) {
>> +        ret = qcow2_get_host_offset(bs, offset, &nr, &off, &type);
>
> pre-patch, but probably better to return original errno on
> qcow2_get_host_offset failure, instead of masking it.

Yeah, I think you're right. I can take care of that on a separate patch.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-04-14 12:33         ` Alberto Garcia
@ 2020-04-14 12:39           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14 12:39 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

14.04.2020 15:33, Alberto Garcia wrote:
> On Tue 14 Apr 2020 02:29:13 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>>> Hmm. How to avoid it? Maybe, at least, refactor the code, to drop all
>>>> sizeof(uint64_t), converting them to L2_ENTRY_SIZE, L1_ENTRY_SIZE,
>>>> REFTABLE_ENTRY_SIZE etc?
>>>
>>> That wouldn't be a bad thing I guess but, again, for a separate patch or
>>> series.
>>>
>>>> And all occurrences of pure '8' (not many of them exist)
>>>
>>> I think most/all nowadays only refer to the number of bits per byte.
>>>
>>> Maybe there's a couple that still need to be fixed, but we have been
>>> removing a lot of numeric literals from the qcow2 code (see for example
>>> b6c246942b, 3afea40243 or a35f87f50d).
>>>
>>
>>
>> git grep '\<8\>' block/qcow2*
>>
>> shows at least
>>
>> qcow2-cluster.c:            s->l1_table_offset + 8 * l1_start_index, bufsize, false);
>> qcow2-cluster.c:                           s->l1_table_offset + 8 * l1_start_index,
> 
> I see, worth replacing with L1_ENTRY_SIZE as you suggest. I can take of
> writing the patches if you want.
> 

That would be great, if not too burdensome :)


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-10  9:29     ` Vladimir Sementsov-Ogievskiy
@ 2020-04-14 14:50       ` Alberto Garcia
  2020-04-14 16:19         ` Vladimir Sementsov-Ogievskiy
  2020-04-15 19:11       ` Alberto Garcia
  1 sibling, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 14:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Fri 10 Apr 2020 11:29:59 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> Hmm - raw external files are incompatible with backing files. Should
>> we also document that extended L2 entries are incompatible with raw
>> external files? (The text here reminded me about it, but it would be
>> the text earlier at the incompatible feature bits that we edit if we
>> want that additional restriction; compare to the restriction in the
>> autoclear bit 1). After all, when raw external file is enabled, the
>> entire image is allocated, at which point subclusters don't make much
>> sense.
> It still may cache information about zeroed subclusters: gives more
> detailed block-status. But we should mention somehow external
> files. Hm. not only for raw external files, but it is documented that
> cluster can't be unallocated when an external data file is used.

What do you mean by "cluster can't be unallocated" ?

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-04-14 12:20     ` Alberto Garcia
  2020-04-14 12:29       ` Vladimir Sementsov-Ogievskiy
@ 2020-04-14 16:01       ` Eric Blake
  2020-04-14 16:16         ` Alberto Garcia
  1 sibling, 1 reply; 128+ messages in thread
From: Eric Blake @ 2020-04-14 16:01 UTC (permalink / raw)
  To: Alberto Garcia, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On 4/14/20 7:20 AM, Alberto Garcia wrote:

>> Hmm. How to avoid it? Maybe, at least, refactor the code, to drop all
>> sizeof(uint64_t), converting them to L2_ENTRY_SIZE, L1_ENTRY_SIZE,
>> REFTABLE_ENTRY_SIZE etc?
> 
> That wouldn't be a bad thing I guess but, again, for a separate patch or
> series.
> 
>> And all occurrences of pure '8' (not many of them exist)
> 
> I think most/all nowadays only refer to the number of bits per byte.

CHAR_BIT (from <limits.h>) is good for that.

> 
> Maybe there's a couple that still need to be fixed, but we have been
> removing a lot of numeric literals from the qcow2 code (see for example
> b6c246942b, 3afea40243 or a35f87f50d).
> 
> Berto
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 11/30] qcow2: Add l2_entry_size()
  2020-04-14 16:01       ` Eric Blake
@ 2020-04-14 16:16         ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 16:16 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Tue 14 Apr 2020 06:01:42 PM CEST, Eric Blake <eblake@redhat.com> wrote:
>>> And all occurrences of pure '8' (not many of them exist)
>> 
>> I think most/all nowadays only refer to the number of bits per byte.
>
> CHAR_BIT (from <limits.h>) is good for that.

Wow, ok, I wonder if that actually makes the code more readable, but
I'll take it into account when writing the patch, thanks.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-14 14:50       ` Alberto Garcia
@ 2020-04-14 16:19         ` Vladimir Sementsov-Ogievskiy
  2020-04-14 16:30           ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14 16:19 UTC (permalink / raw)
  To: Alberto Garcia, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

14.04.2020 17:50, Alberto Garcia wrote:
> On Fri 10 Apr 2020 11:29:59 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>> Hmm - raw external files are incompatible with backing files. Should
>>> we also document that extended L2 entries are incompatible with raw
>>> external files? (The text here reminded me about it, but it would be
>>> the text earlier at the incompatible feature bits that we edit if we
>>> want that additional restriction; compare to the restriction in the
>>> autoclear bit 1). After all, when raw external file is enabled, the
>>> entire image is allocated, at which point subclusters don't make much
>>> sense.
>> It still may cache information about zeroed subclusters: gives more
>> detailed block-status. But we should mention somehow external
>> files. Hm. not only for raw external files, but it is documented that
>> cluster can't be unallocated when an external data file is used.
> 
> What do you mean by "cluster can't be unallocated" ?
> 


I mean this sentence from qcow2.txt:

                    "The offset may only be 0 with
                     bit 63 set (indicating a host cluster offset of 0) when an
                     external data file is used."

In other words, cluster can't be unallocated with data file in use.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-14 16:19         ` Vladimir Sementsov-Ogievskiy
@ 2020-04-14 16:30           ` Alberto Garcia
  2020-04-14 18:06             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 16:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Tue 14 Apr 2020 06:19:18 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>> It still may cache information about zeroed subclusters: gives more
>>> detailed block-status. But we should mention somehow external
>>> files. Hm. not only for raw external files, but it is documented that
>>> cluster can't be unallocated when an external data file is used.
>> 
>> What do you mean by "cluster can't be unallocated" ?
>
> I mean this sentence from qcow2.txt:
>
>                     "The offset may only be 0 with
>                      bit 63 set (indicating a host cluster offset of 0) when an
>                      external data file is used."
>
> In other words, cluster can't be unallocated with data file in use.

I still don't follow... clusters can be unallocated, and when you create
a new image they are indeed unallocated.

Bit 63 (QCOW_OFLAG_COPIED) is what indicates if a cluster is allocated
or not, and you can unmap an allocated cluster with 'write -z -u'.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-14 16:30           ` Alberto Garcia
@ 2020-04-14 18:06             ` Vladimir Sementsov-Ogievskiy
  2020-04-14 18:13               ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-14 18:06 UTC (permalink / raw)
  To: Alberto Garcia, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

14.04.2020 19:30, Alberto Garcia wrote:
> On Tue 14 Apr 2020 06:19:18 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>>> It still may cache information about zeroed subclusters: gives more
>>>> detailed block-status. But we should mention somehow external
>>>> files. Hm. not only for raw external files, but it is documented that
>>>> cluster can't be unallocated when an external data file is used.
>>>
>>> What do you mean by "cluster can't be unallocated" ?
>>
>> I mean this sentence from qcow2.txt:
>>
>>                      "The offset may only be 0 with
>>                       bit 63 set (indicating a host cluster offset of 0) when an
>>                       external data file is used."
>>
>> In other words, cluster can't be unallocated with data file in use.
> 
> I still don't follow... clusters can be unallocated, and when you create
> a new image they are indeed unallocated.

with external data file? Than we probably need to fix spec..

unallocated mean that offset is 0, and bit 63 is unset. But this can't be when and exernal data file is used, accordingly to the spec.

Or what am I missing?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-14 18:06             ` Vladimir Sementsov-Ogievskiy
@ 2020-04-14 18:13               ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 18:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Tue 14 Apr 2020 08:06:38 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>> In other words, cluster can't be unallocated with data file in use.
>> 
>> I still don't follow... clusters can be unallocated, and when you
>> create a new image they are indeed unallocated.
>
> with external data file? Than we probably need to fix spec..
>
> unallocated mean that offset is 0, and bit 63 is unset. But this can't
> be when and exernal data file is used, accordingly to the spec.
>
> Or what am I missing?

   $ qemu-img create -f qcow2 -o data_file=data.raw img.qcow2 1M
   $ qemu-io -c 'write 0 192k' img.qcow2 
   $ qemu-io -c 'write -z -u 64k 64k' img.qcow2 

Clusters #0 and #2 are allocated (offsets 0x00000 and 0x20000), cluster
#1 is unallocated (offset 0, bit 63 unset, bit 0 -all zeroes- set).

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-09 15:12   ` Eric Blake
  2020-04-10  9:29     ` Vladimir Sementsov-Ogievskiy
  2020-04-10 12:01     ` Alberto Garcia
@ 2020-04-14 18:16     ` Alberto Garcia
  2020-04-14 18:23       ` Eric Blake
  2 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-14 18:16 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Thu 09 Apr 2020 05:12:16 PM CEST, Eric Blake <eblake@redhat.com> wrote:
> Hmm - raw external files are incompatible with backing files.

Pre-existing, but I just realized that we are not checking that in
qcow2_do_open(), only on _create().

I suppose that if we find such an image we should either

   a) Show an error message and abort.
   b) Clear the 'raw data file' bit and proceed as if it was unset.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-14 18:16     ` Alberto Garcia
@ 2020-04-14 18:23       ` Eric Blake
  2020-04-14 18:25         ` Eric Blake
  0 siblings, 1 reply; 128+ messages in thread
From: Eric Blake @ 2020-04-14 18:23 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 4/14/20 1:16 PM, Alberto Garcia wrote:
> On Thu 09 Apr 2020 05:12:16 PM CEST, Eric Blake <eblake@redhat.com> wrote:
>> Hmm - raw external files are incompatible with backing files.
> 
> Pre-existing, but I just realized that we are not checking that in
> qcow2_do_open(), only on _create().
> 
> I suppose that if we find such an image we should either
> 
>     a) Show an error message and abort.
>     b) Clear the 'raw data file' bit and proceed as if it was unset.

I would favor a).  Such an image was (hopefully) created externally, and 
not by qemu; therefore refusing to open it will call attention to the 
image (and it's creation process) being broken, rather than risking 
silent corruption of whatever the external process thought it was 
accomplishing by creating an image like that.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-14 18:23       ` Eric Blake
@ 2020-04-14 18:25         ` Eric Blake
  0 siblings, 0 replies; 128+ messages in thread
From: Eric Blake @ 2020-04-14 18:25 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 4/14/20 1:23 PM, Eric Blake wrote:
> On 4/14/20 1:16 PM, Alberto Garcia wrote:
>> On Thu 09 Apr 2020 05:12:16 PM CEST, Eric Blake <eblake@redhat.com> 
>> wrote:
>>> Hmm - raw external files are incompatible with backing files.
>>
>> Pre-existing, but I just realized that we are not checking that in
>> qcow2_do_open(), only on _create().
>>
>> I suppose that if we find such an image we should either
>>
>>     a) Show an error message and abort.
>>     b) Clear the 'raw data file' bit and proceed as if it was unset.
> 
> I would favor a).  Such an image was (hopefully) created externally, and 
> not by qemu; therefore refusing to open it will call attention to the 
> image (and it's creation process) being broken, rather than risking 
> silent corruption of whatever the external process thought it was 
> accomplishing by creating an image like that.

Also, 'qemu-img check' should flag the problem, and I'd be okay with 
'qemu-img check -r all' repairing the problem by method b) (because then 
the user is explicitly opting in to having qemu change the image in 
order to maximize the amount of data that qemu can then extract from the 
image).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
  2020-03-17 18:16 ` [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
  2020-04-08 12:42   ` Max Reitz
@ 2020-04-15  7:10   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-15  7:10 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> In order to support extended L2 entries some functions of the qcow2
> driver need to start dealing with subclusters instead of clusters.
> 
> qcow2_get_host_offset() is modified to return the subcluster type
> instead of the cluster type, and all callers are updated to replace
> all values of QCow2ClusterType with their QCow2SubclusterType
> equivalents.
> 
> This patch only changes the data types, there are no semantic changes.
> 
> Signed-off-by: Alberto Garcia<berto@igalia.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 16/30] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
  2020-03-17 18:16 ` [PATCH v4 16/30] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
@ 2020-04-15  7:28   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-15  7:28 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> When dealing with subcluster types there is a new value called
> QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC that has no equivalent in
> QCow2ClusterType.
> 
> This patch handles that value in all places where subcluster types
> are processed.
> 
> Signed-off-by: Alberto Garcia<berto@igalia.com>
> Reviewed-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta()
  2020-03-17 18:16 ` [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
@ 2020-04-15  8:39   ` Vladimir Sementsov-Ogievskiy
  2020-04-16 20:01     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-15  8:39 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> If an image has subclusters then there are more copy-on-write
> scenarios that we need to consider. Let's say we have a write request
> from the middle of subcluster #3 until the end of the cluster:
> 
>     - If the cluster is new, then subclusters #0 to #3 from the old
>       cluster must be copied into the new one.
> 
>     - If the cluster is new but the old cluster was unallocated, then
>       only subcluster #3 needs copy-on-write. #0 to #2 are marked as
>       unallocated in the bitmap of the new L2 entry.
> 
>     - If we are overwriting an old cluster and subcluster #3 is
>       unallocated or has the all-zeroes bit set then we need
>       copy-on-write on subcluster #3.
> 
>     - If we are overwriting an old cluster and subcluster #3 was
>       allocated then there is no need to copy-on-write.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c | 140 +++++++++++++++++++++++++++++++++---------
>   1 file changed, 110 insertions(+), 30 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 8cdf8a23b6..c6f3cc9237 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1061,56 +1061,128 @@ void qcow2_alloc_cluster_abort(BlockDriverState *bs, QCowL2Meta *m)
>    * If @keep_old is true it means that the clusters were already
>    * allocated and will be overwritten. If false then the clusters are
>    * new and we have to decrease the reference count of the old ones.
> + *
> + * Returns 1 on success, -errno on failure (in order to match the
> + * return value of handle_copied() and handle_alloc()).

Hmm, honestly, I don't like this idea. handle_copied and handle_alloc has special return code semantics. Here no reason for special semantics, just classic error/success. Introducing new semantics (I think, no similar functions are in qcow2-cluster.c and may be in the whole qcow2 subsystem) just because the function is used only on return-1 paths of its callers, to save several lines of code - this doesn't seem good reason for me.

Or, may be the reason will appear in the following patches? I'll see.

>    */
> -static void calculate_l2_meta(BlockDriverState *bs,
> -                              uint64_t host_cluster_offset,
> -                              uint64_t guest_offset, unsigned bytes,
> -                              uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
> +static int calculate_l2_meta(BlockDriverState *bs, uint64_t host_cluster_offset,
> +                             uint64_t guest_offset, unsigned bytes,
> +                             uint64_t *l2_slice, QCowL2Meta **m, bool keep_old)
>   {
>       BDRVQcow2State *s = bs->opaque;
> -    int l2_index = offset_to_l2_slice_index(s, guest_offset);
> -    uint64_t l2_entry;
> +    int sc_index, l2_index = offset_to_l2_slice_index(s, guest_offset);
> +    uint64_t l2_entry, l2_bitmap;
>       unsigned cow_start_from, cow_end_to;
>       unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>       unsigned cow_end_from = cow_start_to + bytes;
>       unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>       QCowL2Meta *old_m = *m;
> -    QCow2ClusterType type;
> +    QCow2SubclusterType type;
>   
>       assert(nb_clusters <= s->l2_slice_size - l2_index);
>   
> -    /* Return if there's no COW (all clusters are normal and we keep them) */
> +    /* Return if there's no COW (all subclusters are normal and we are
> +     * keeping the clusters) */
>       if (keep_old) {
> +        unsigned first_sc = cow_start_to / s->subcluster_size;
> +        unsigned last_sc = (cow_end_from - 1) / s->subcluster_size;
>           int i;
> -        for (i = 0; i < nb_clusters; i++) {
> -            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
> -            if (qcow2_get_cluster_type(bs, l2_entry) != QCOW2_CLUSTER_NORMAL) {
> +        for (i = first_sc; i <= last_sc; i++) {
> +            unsigned c = i / s->subclusters_per_cluster;
> +            unsigned sc = i % s->subclusters_per_cluster;
> +            l2_entry = get_l2_entry(s, l2_slice, l2_index + c);
> +            l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + c);
> +            type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc);
> +            if (type == QCOW2_SUBCLUSTER_INVALID) {
> +                l2_index += c; /* Point to the invalid entry */
> +                goto fail;
> +            }
> +            if (type != QCOW2_SUBCLUSTER_NORMAL) {
>                   break;
>               }
>           }
> -        if (i == nb_clusters) {
> -            return;
> +        if (i == last_sc + 1) {
> +            return 1;
>           }
>       }
>   
>       /* Get the L2 entry of the first cluster */
>       l2_entry = get_l2_entry(s, l2_slice, l2_index);
> -    type = qcow2_get_cluster_type(bs, l2_entry);
> +    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
> +    sc_index = offset_to_sc_index(s, guest_offset);
> +    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
>   
> -    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> -        cow_start_from = cow_start_to;
> +    if (type == QCOW2_SUBCLUSTER_INVALID) {
> +        goto fail;
> +    }
> +
> +    if (!keep_old) {
> +        switch (type) {
> +        case QCOW2_SUBCLUSTER_NORMAL:
> +        case QCOW2_SUBCLUSTER_COMPRESSED:
> +        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
> +        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
> +            cow_start_from = 0;
> +            break;
> +        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
> +        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
> +            cow_start_from = sc_index << s->subcluster_bits;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
>       } else {
> -        cow_start_from = 0;
> +        switch (type) {
> +        case QCOW2_SUBCLUSTER_NORMAL:
> +            cow_start_from = cow_start_to;
> +            break;
> +        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
> +        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
> +            cow_start_from = sc_index << s->subcluster_bits;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
>       }
>   
>       /* Get the L2 entry of the last cluster */
> -    l2_entry = get_l2_entry(s, l2_slice, l2_index + nb_clusters - 1);
> -    type = qcow2_get_cluster_type(bs, l2_entry);
> +    l2_index += nb_clusters - 1;
> +    l2_entry = get_l2_entry(s, l2_slice, l2_index);
> +    l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index);
> +    sc_index = offset_to_sc_index(s, guest_offset + bytes - 1);
> +    type = qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, sc_index);
>   
> -    if (type == QCOW2_CLUSTER_NORMAL && keep_old) {
> -        cow_end_to = cow_end_from;
> +    if (type == QCOW2_SUBCLUSTER_INVALID) {
> +        goto fail;
> +    }
> +
> +    if (!keep_old) {
> +        switch (type) {

Hmm, big part of code mostly copied from handling first sub-cluster.. But I'm not sure that it worth refactoring now, may be later..

> +        case QCOW2_SUBCLUSTER_NORMAL:
> +        case QCOW2_SUBCLUSTER_COMPRESSED:
> +        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
> +        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
> +            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);

Hmm. Interesting, actually, we don't need to COW  QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC subclusters in cow-area.. But this need more modifications to cow-handling.

> +            break;
> +        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
> +        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
> +            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);


This is because in new cluster we can made previous subclusters unallocated, and don't copy from backing.
Hmm, actually, we should not just make them unallocated, but copy part of bitmap from original l2-entry.. I need to keep it in mind for next patches.

> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
>       } else {
> -        cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
> +        switch (type) {
> +        case QCOW2_SUBCLUSTER_NORMAL:
> +            cow_end_to = cow_end_from;
> +            break;
> +        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
> +        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
> +            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
>       }
>   
>       *m = g_malloc0(sizeof(**m));
> @@ -1135,6 +1207,18 @@ static void calculate_l2_meta(BlockDriverState *bs,
>   
>       qemu_co_queue_init(&(*m)->dependent_requests);
>       QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
> +
> +fail:

maybe, s/fail/out/

> +    if (type == QCOW2_SUBCLUSTER_INVALID) {
> +        uint64_t l1_index = offset_to_l1_index(s, guest_offset);
> +        uint64_t l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
> +        qcow2_signal_corruption(bs, true, -1, -1, "Invalid cluster entry found "
> +                                " (L2 offset: %#" PRIx64 ", L2 index: %#x)",
> +                                l2_offset, l2_index);
> +        return -EIO;
> +    }
> +
> +    return 1;
>   }
>   
>   /*
> @@ -1352,10 +1436,8 @@ static int handle_copied(BlockDriverState *bs, uint64_t guest_offset,
>                    - offset_into_cluster(s, guest_offset));
>           assert(*bytes != 0);
>   
> -        calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK, guest_offset,
> -                          *bytes, l2_slice, m, true);
> -
> -        ret = 1;
> +        ret = calculate_l2_meta(bs, cluster_offset & L2E_OFFSET_MASK,
> +                                guest_offset, *bytes, l2_slice, m, true);
>       } else {
>           ret = 0;
>       }
> @@ -1530,10 +1612,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>       *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>       assert(*bytes != 0);
>   
> -    calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, l2_slice,
> -                      m, false);
> -
> -    ret = 1;
> +    ret = calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
> +                            l2_slice, m, false);
>   
>   out:
>       qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
> 

Anyway, patch should work as intended, so. if you want to keep it as is:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-10  9:29     ` Vladimir Sementsov-Ogievskiy
  2020-04-14 14:50       ` Alberto Garcia
@ 2020-04-15 19:11       ` Alberto Garcia
  2020-04-15 21:13         ` Eric Blake
  1 sibling, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-15 19:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Fri 10 Apr 2020 11:29:59 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> Should we also document that extended L2 entries are incompatible
>> with raw external files? [...] After all, when raw external file is
>> enabled, the entire image is allocated, at which point subclusters
>> don't make much sense.
>
> It still may cache information about zeroed subclusters: gives more
> detailed block-status.

So shall I forbid extended_l2 + data_file_raw then?

I wonder, if the only problem is that it's just not very useful, does it
make sense to add additional complexity and restrictions to the code
simply to prevent the user from making a sub-optimal choice?

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature
  2020-04-15 19:11       ` Alberto Garcia
@ 2020-04-15 21:13         ` Eric Blake
  0 siblings, 0 replies; 128+ messages in thread
From: Eric Blake @ 2020-04-15 21:13 UTC (permalink / raw)
  To: Alberto Garcia, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On 4/15/20 2:11 PM, Alberto Garcia wrote:
> On Fri 10 Apr 2020 11:29:59 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>> Should we also document that extended L2 entries are incompatible
>>> with raw external files? [...] After all, when raw external file is
>>> enabled, the entire image is allocated, at which point subclusters
>>> don't make much sense.
>>
>> It still may cache information about zeroed subclusters: gives more
>> detailed block-status.

That's a good point about one reason why it might be useful.

> 
> So shall I forbid extended_l2 + data_file_raw then?
> 
> I wonder, if the only problem is that it's just not very useful, does it
> make sense to add additional complexity and restrictions to the code
> simply to prevent the user from making a sub-optimal choice?

At this point, I'm not seeing a technical reason why we have to forbid 
subclusters with data-file-raw.  Mixing may be inefficient compared to 
using raw-data-file without subclusters, but inefficiencies are not 
worth the code bloat to forbid the combination.  If we come up with a 
scenario where the mix would cause data corruption, that's a different 
story, but I'm not seeing such a reason at the moment.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta()
  2020-04-15  8:39   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-16 20:01     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-16 20:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Wed 15 Apr 2020 10:39:26 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> + * Returns 1 on success, -errno on failure (in order to match the
>> + * return value of handle_copied() and handle_alloc()).
>
> Hmm, honestly, I don't like this idea. handle_copied and handle_alloc
> has special return code semantics. Here no reason for special
> semantics, just classic error/success.

Right, the only reason is to avoid adding something like this after all
callers:

        if (ret == 0) {
            ret = 1;
        }

But you have a point, maybe I change it after all.

>> +        case QCOW2_SUBCLUSTER_NORMAL:
>> +        case QCOW2_SUBCLUSTER_COMPRESSED:
>> +        case QCOW2_SUBCLUSTER_ZERO_ALLOC:
>> +        case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
>> +            cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>
> Hmm. Interesting, actually, we don't need to COW
> QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC subclusters in cow-area.. But this
> need more modifications to cow-handling.

True, if there are more unallocated subclusters in the cow area we could
make the copy operation smaller. I'm not sure if it's worth adding extra
code for this, but maybe I can leave a comment.

>> +            break;
>> +        case QCOW2_SUBCLUSTER_ZERO_PLAIN:
>> +        case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
>> +            cow_end_to = ROUND_UP(cow_end_from, s->subcluster_size);
>
>
> This is because in new cluster we can made previous subclusters
> unallocated, and don't copy from backing.
> Hmm, actually, we should not just make them unallocated, but copy part
> of bitmap from original l2-entry.. I need to keep it in mind for next
> patches.

The bitmap is always copied from the original L2 entry, you can see it
in the patch "qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()"

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 00/30] Add subcluster allocation to qcow2
  2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
                   ` (29 preceding siblings ...)
  2020-03-17 18:16 ` [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
@ 2020-04-21  5:06 ` Derek Su
  2020-04-21 10:35   ` Alberto Garcia
  30 siblings, 1 reply; 128+ messages in thread
From: Derek Su @ 2020-04-21  5:06 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9974 bytes --]

Hello,

This work is promising and interesting.
I'd like to try this new feature.
Could you please export a branch because the patches cannot be applied to current master?
Thanks.

Regards,
Derek


On 2020/3/18 上午2:15, Alberto Garcia wrote:
> Hi,
>
> here's the new version of the patches to add subcluster allocation
> support to qcow2.
>
> Please refer to the cover letter of the first version for a full
> description of the patches:
>
>     https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html
>
> I think that this version fixes all the problems pointed out by Max
> and Eric during the review a couple of weeks ago. I also dropped the
> RFC tag.
>
> Berto
>
> v4:
> - Patch 01: New patch
> - Patch 02: New patch
> - Patch 05: Documentation updates [Eric]
> - Patch 06: Fix rebase conflicts
> - Patch 07: Change bit order in the subcluster allocation bitmap.
>              Change incompatible bit number. [Max, Eric]
> - Patch 09: Rename QCOW_MAX_SUBCLUSTERS_PER_CLUSTER to
>              QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER [Eric]
> - Patch 13: Change bit order in the subcluster allocation bitmap [Max, Eric]
>              Add more documentation.
>              Ignore the subcluster bitmap in the L2 entries of
>              compressed clusters.
> - Patch 14: New patch
> - Patch 15: Update to work with the changes from patches 02 and 14.
> - Patch 16: Update to work with the changes from patches 02 and 14.
> - Patch 18: Update to work with the changes from patches 02 and 14.
>              Update documentation.
>              Fix return value on early exit.
> - Patch 20: Make sure to clear the subcluster allocation bitmap when a
>              cluster is unallocated.
> - Patch 26: Update to work with the changes from patch 14.
> - Patch 27: New patch [Max]
> - Patch 28: Update version number, incompatible bit number and test
>              expectations.
> - Patch 30: Add new tests.
>              Make the test verify its own results. [Max]
>
> v3: https://lists.gnu.org/archive/html/qemu-block/2019-12/msg00587.html
> - Patch 01: Rename host_offset to host_cluster_offset and make 'bytes'
>              an unsigned int [Max]
> - Patch 03: Rename cluster_needs_cow to cluster_needs_new_alloc and
>              count_cow_clusters to count_single_write_clusters. Update
>              documentation and add more assertions and checks [Max]
> - Patch 09: Update qcow2_co_truncate() to properly support extended L2
>              entries [Max]
> - Patch 10: Forbid calling set_l2_bitmap() if the image does not have
>              extended L2 entries [Max]
> - Patch 11 (new): Add QCow2SubclusterType [Max]
> - Patch 12 (new): Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
> - Patch 13 (new): Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
> - Patch 14: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
> - Patch 15: Use QCow2SubclusterType instead of QCow2ClusterType [Max]
> - Patch 19: Don't call set_l2_bitmap() if the image does not have
>              extended L2 entries [Max]
> - Patch 21: Use smaller data types.
> - Patch 22: Don't call set_l2_bitmap() if the image does not have
>              extended L2 entries [Max]
> - Patch 23: Use smaller data types.
> - Patch 25: Update test results and documentation. Move the check for
>              the minimum subcluster size to validate_cluster_size().
> - Patch 26 (new): Add subcluster support to qcow2_measure()
> - Patch 27: Add more tests
>
> v2: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg01642.html
> - Patch 12: Update after the changes in 88f468e546.
> - Patch 21 (new): Clear the L2 bitmap when allocating a compressed
>    cluster. Compressed clusters should have the bitmap all set to 0.
> - Patch 24: Document the new fields in the QAPI documentation [Eric].
> - Patch 25: Allow qcow2 preallocation with backing files.
> - Patch 26: Add some tests for qcow2 images with extended L2 entries.
>
> v1: https://lists.gnu.org/archive/html/qemu-block/2019-10/msg00983.html
>
> Output of git backport-diff against v3:
>
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
>
> 001/30:[down] 'qcow2: Make Qcow2AioTask store the full host offset'
> 002/30:[down] 'qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()'
> 003/30:[----] [-C] 'qcow2: Add calculate_l2_meta()'
> 004/30:[----] [--] 'qcow2: Split cluster_needs_cow() out of count_cow_clusters()'
> 005/30:[0020] [FC] 'qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()'
> 006/30:[0010] [FC] 'qcow2: Add get_l2_entry() and set_l2_entry()'
> 007/30:[0020] [FC] 'qcow2: Document the Extended L2 Entries feature'
> 008/30:[----] [--] 'qcow2: Add dummy has_subclusters() function'
> 009/30:[0004] [FC] 'qcow2: Add subcluster-related fields to BDRVQcow2State'
> 010/30:[----] [--] 'qcow2: Add offset_to_sc_index()'
> 011/30:[----] [-C] 'qcow2: Add l2_entry_size()'
> 012/30:[----] [--] 'qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()'
> 013/30:[0046] [FC] 'qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()'
> 014/30:[down] 'qcow2: Add cluster type parameter to qcow2_get_host_offset()'
> 015/30:[0082] [FC] 'qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*'
> 016/30:[0002] [FC] 'qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC'
> 017/30:[----] [-C] 'qcow2: Add subcluster support to calculate_l2_meta()'
> 018/30:[down] 'qcow2: Add subcluster support to qcow2_get_host_offset()'
> 019/30:[----] [--] 'qcow2: Add subcluster support to zero_in_l2_slice()'
> 020/30:[0012] [FC] 'qcow2: Add subcluster support to discard_in_l2_slice()'
> 021/30:[----] [--] 'qcow2: Add subcluster support to check_refcounts_l2()'
> 022/30:[----] [--] 'qcow2: Fix offset calculation in handle_dependencies()'
> 023/30:[----] [-C] 'qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()'
> 024/30:[----] [--] 'qcow2: Clear the L2 bitmap when allocating a compressed cluster'
> 025/30:[----] [--] 'qcow2: Add subcluster support to handle_alloc_space()'
> 026/30:[0006] [FC] 'qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only'
> 027/30:[down] 'qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters'
> 028/30:[0019] [FC] 'qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit'
> 029/30:[----] [--] 'qcow2: Add subcluster support to qcow2_measure()'
> 030/30:[0313] [FC] 'iotests: Add tests for qcow2 images with extended L2 entries'
>
> Alberto Garcia (30):
>    qcow2: Make Qcow2AioTask store the full host offset
>    qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset()
>    qcow2: Add calculate_l2_meta()
>    qcow2: Split cluster_needs_cow() out of count_cow_clusters()
>    qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()
>    qcow2: Add get_l2_entry() and set_l2_entry()
>    qcow2: Document the Extended L2 Entries feature
>    qcow2: Add dummy has_subclusters() function
>    qcow2: Add subcluster-related fields to BDRVQcow2State
>    qcow2: Add offset_to_sc_index()
>    qcow2: Add l2_entry_size()
>    qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap()
>    qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type()
>    qcow2: Add cluster type parameter to qcow2_get_host_offset()
>    qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_*
>    qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC
>    qcow2: Add subcluster support to calculate_l2_meta()
>    qcow2: Add subcluster support to qcow2_get_host_offset()
>    qcow2: Add subcluster support to zero_in_l2_slice()
>    qcow2: Add subcluster support to discard_in_l2_slice()
>    qcow2: Add subcluster support to check_refcounts_l2()
>    qcow2: Fix offset calculation in handle_dependencies()
>    qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
>    qcow2: Clear the L2 bitmap when allocating a compressed cluster
>    qcow2: Add subcluster support to handle_alloc_space()
>    qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
>    qcow2: Assert that expand_zero_clusters_in_l1() does not support
>      subclusters
>    qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit
>    qcow2: Add subcluster support to qcow2_measure()
>    iotests: Add tests for qcow2 images with extended L2 entries
>
>   docs/interop/qcow2.txt           |  68 ++-
>   docs/qcow2-cache.txt             |  19 +-
>   qapi/block-core.json             |   7 +
>   block/qcow2.h                    | 178 +++++++-
>   include/block/block_int.h        |   1 +
>   block/qcow2-cluster.c            | 696 ++++++++++++++++++++-----------
>   block/qcow2-refcount.c           |  38 +-
>   block/qcow2.c                    | 257 +++++++-----
>   tests/qemu-iotests/031.out       |   8 +-
>   tests/qemu-iotests/036.out       |   4 +-
>   tests/qemu-iotests/049.out       | 102 ++---
>   tests/qemu-iotests/060.out       |   1 +
>   tests/qemu-iotests/061           |   6 +
>   tests/qemu-iotests/061.out       |  25 +-
>   tests/qemu-iotests/065           |  18 +-
>   tests/qemu-iotests/082.out       |  48 ++-
>   tests/qemu-iotests/085.out       |  38 +-
>   tests/qemu-iotests/144.out       |   4 +-
>   tests/qemu-iotests/182.out       |   2 +-
>   tests/qemu-iotests/185.out       |   8 +-
>   tests/qemu-iotests/198.out       |   2 +
>   tests/qemu-iotests/206.out       |   4 +
>   tests/qemu-iotests/242.out       |   5 +
>   tests/qemu-iotests/255.out       |   8 +-
>   tests/qemu-iotests/271           | 359 ++++++++++++++++
>   tests/qemu-iotests/271.out       | 244 +++++++++++
>   tests/qemu-iotests/280.out       |   2 +-
>   tests/qemu-iotests/common.filter |   1 +
>   tests/qemu-iotests/group         |   1 +
>   29 files changed, 1682 insertions(+), 472 deletions(-)
>   create mode 100755 tests/qemu-iotests/271
>   create mode 100644 tests/qemu-iotests/271.out
>


[-- Attachment #2: Type: text/html, Size: 11120 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 00/30] Add subcluster allocation to qcow2
  2020-04-21  5:06 ` [PATCH v4 00/30] Add subcluster allocation to qcow2 Derek Su
@ 2020-04-21 10:35   ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-21 10:35 UTC (permalink / raw)
  To: Derek Su, qemu-devel

On Tue 21 Apr 2020 07:06:42 AM CEST, Derek Su <dereksu@qnap.com> wrote:
> Hello,
>
> This work is promising and interesting.
> I'd like to try this new feature.
> Could you please export a branch because the patches cannot be applied
> to current master?

Hi,

you can apply v4 on top of 3189e9d38c82266ea5750a81255fd229c7ddf1e6

I also plan to publish v5 this week.

Any feedback is very much appreciated.

Thanks!

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset()
  2020-03-17 18:16 ` [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset() Alberto Garcia
  2020-04-08 12:49   ` Max Reitz
@ 2020-04-22  8:07   ` Vladimir Sementsov-Ogievskiy
  2020-04-22 11:54     ` Alberto Garcia
  1 sibling, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-22  8:07 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> The logic of this function remains pretty much the same, except that
> it uses count_contiguous_subclusters(), which combines the logic of
> count_contiguous_clusters() / count_contiguous_clusters_unallocated()
> and checks individual subclusters.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---

[..]

> +static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
> +                                        unsigned sc_index, uint64_t *l2_slice,
> +                                        int l2_index)
>   {
>       BDRVQcow2State *s = bs->opaque;

preexist, but, worth asserting that nb_clusters are all in this l2_slice?

[..]

> +        for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) {
> +            if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) {
> +                goto out;

why not just return count from here? And then you don't need goto at all. Hmm, may be out: code will be extended in further patches..

> +            }
> +            count++;
>           }
> +        expected_offset += s->cluster_size;
>       }
>   
> -    return i;
> +out:
> +    return count;
>   }
>   

[..]

> @@ -607,21 +607,20 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
>               goto fail;
>           }
>           /* Compressed clusters can only be processed one by one */
> -        c = 1;
> +        sc = s->subclusters_per_cluster - sc_index;

should not we assert here that sc_index == 0? Otherwise the caller definitely doing something wrong.

>           *host_offset = l2_entry & L2E_COMPRESSED_OFFSET_SIZE_MASK;
>           break;
> -    case QCOW2_CLUSTER_ZERO_PLAIN:
> -    case QCOW2_CLUSTER_UNALLOCATED:
> -        /* how many empty clusters ? */
> -        c = count_contiguous_clusters_unallocated(bs, nb_clusters,
> -                                                  l2_slice, l2_index, type);
> +    case QCOW2_SUBCLUSTER_ZERO_PLAIN:
> +    case QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN:
> +        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
> +                                          l2_slice, l2_index);
>           *host_offset = 0;
>           break;
> -    case QCOW2_CLUSTER_ZERO_ALLOC:
> -    case QCOW2_CLUSTER_NORMAL:
> -        /* how many allocated clusters ? */
> -        c = count_contiguous_clusters(bs, nb_clusters, s->cluster_size,
> -                                      l2_slice, l2_index, QCOW_OFLAG_ZERO);
> +    case QCOW2_SUBCLUSTER_ZERO_ALLOC:
> +    case QCOW2_SUBCLUSTER_NORMAL:
> +    case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
> +        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
> +                                          l2_slice, l2_index);
>           *host_offset = l2_entry & L2E_OFFSET_MASK;
>           if (offset_into_cluster(s, *host_offset)) {

Hmm, you may move "sc = count_contiguous_subclusters" to be after the switch-block, as it is universal now. And keep only offset calculation and error checking in the switch-block.

>               qcow2_signal_corruption(bs, true, -1, -1,
> @@ -651,7 +650,7 @@ int qcow2_get_host_offset(BlockDriverState *bs, uint64_t offset,
>   
>       qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
>   
> -    bytes_available = (int64_t)c * s->cluster_size;
> +    bytes_available = ((int64_t)sc + sc_index) << s->subcluster_bits;
>   
>   out:
>       if (bytes_available > bytes_needed) {
> @@ -664,7 +663,7 @@ out:
>       assert(bytes_available - offset_in_cluster <= UINT_MAX);
>       *bytes = bytes_available - offset_in_cluster;
>   
> -    *subcluster_type = qcow2_cluster_to_subcluster_type(type);
> +    *subcluster_type = type;
>   
>       return 0;
>   
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 19/30] qcow2: Add subcluster support to zero_in_l2_slice()
  2020-03-17 18:16 ` [PATCH v4 19/30] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
@ 2020-04-22 11:06   ` Vladimir Sementsov-Ogievskiy
  2020-04-22 12:53     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-22 11:06 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
> image has subclusters. Instead, the individual 'all zeroes' bits must
> be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

still, some comments below

> ---
>   block/qcow2-cluster.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 6f2643ba53..746006a117 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1897,7 +1897,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,

As I see, function is not prepared to handle unaligned offset. Worth add an assertion while being here?

>       assert(nb_clusters <= INT_MAX);
>   
>       for (i = 0; i < nb_clusters; i++) {
> -        uint64_t old_offset;
> +        uint64_t old_offset, l2_entry = 0;

I'd rename s/old_offset/old_l2_entry

>           QCow2ClusterType cluster_type;
>   
>           old_offset = get_l2_entry(s, l2_slice, l2_index + i);

more context:

 >         /*
 >          * Minimize L2 changes if the cluster already reads back as
 >          * zeroes with correct allocation.
 >          */
 >         cluster_type = qcow2_get_cluster_type(bs, old_offset);
 >         if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
 >             (cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {

Worth assert !has_subclusters(s), or mark image corrupted?

 >             continue;
 >         }


> @@ -1914,12 +1914,18 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>   
>           qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>           if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
> -            set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>               qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>           } else {
> -            uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
> -            set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
> +            l2_entry = get_l2_entry(s, l2_slice, l2_index + i);
>           }
> +
> +        if (has_subclusters(s)) {
> +            set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES);
> +        } else {
> +            l2_entry |= QCOW_OFLAG_ZERO;
> +        }
> +
> +        set_l2_entry(s, l2_slice, l2_index + i, l2_entry);

For subclasters & !unmap case we set the same value.. And we even don't need to get it.

may be

           if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
               qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
               set_l2_entry(s, l2_slice, l2_index + i,
                            has_subclusters(s) ? 0 : QCOW_OFLAG_ZERO);
           } else if (!has_subclusters(s)) {
               uint64_t entry = get_l2_entry(s, l2_slice, l2_index + i);
               set_l2_entry(s, l2_slice, l2_index + i, entry | QCOW_OFLAG_ZERO);
           }

           if (has_subclusters(s)) {
               set_l2_bitmap(s, l2_slice, l2_index + i, QCOW_L2_BITMAP_ALL_ZEROES);
           }




>       }
>   
>       qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-03-17 18:16 ` [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
  2020-04-09 10:05   ` Max Reitz
@ 2020-04-22 11:35   ` Vladimir Sementsov-Ogievskiy
  2020-04-22 17:42     ` Alberto Garcia
  1 sibling, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-22 11:35 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> Two changes are needed in this function:
> 
> 1) A full discard deallocates a cluster so we can skip the operation if
>     it is already unallocated. With extended L2 entries however if any
>     of the subclusters has the 'all zeroes' bit set then we have to
>     clear it.
> 
> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>     image has extended L2 entries. Instead, the individual 'all zeroes'
>     bits must be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cluster.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 746006a117..824c710760 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>            * TODO We might want to use bdrv_block_status(bs) here, but we're
>            * holding s->lock, so that doesn't work today.
>            *
> -         * If full_discard is true, the sector should not read back as zeroes,
> +         * If full_discard is true, the cluster should not read back as zeroes,
>            * but rather fall through to the backing file.
>            */
>           switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>           case QCOW2_CLUSTER_UNALLOCATED:
> -            if (full_discard || !bs->backing) {
> +            if (full_discard) {
> +                /* If the image has extended L2 entries we can only
> +                 * skip this operation if the L2 bitmap is zero. */
> +                uint64_t bitmap = has_subclusters(s) ?
> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
> +                if (bitmap == 0) {
> +                    continue;
> +                }
> +            } else if (!bs->backing) {
>                   continue;
>               }

Hmm, so you do continue if full_discard is false AND bitmap != 0 & !bs->backing,
but you do not continue if full_discard is true AND bitmap != 0 & !bs->backing (as you will not go to "else") branch.

Seems it's a mistake.

I think, correct condition is

if (!bs->backing || full_discard && !get_l2_bitmap(s, l2_slice, l2_index + i))

, but, for doing so we also need


--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -565,6 +565,7 @@ static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
      return be64_to_cpu(l2_slice[idx]);
  }

+/* Return l2-entry bitmap if image has subclusters and 0 otherwise. */
  static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
                                       int idx)
  {
@@ -572,7 +573,6 @@ static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
          idx *= l2_entry_size(s) / sizeof(uint64_t);
          return be64_to_cpu(l2_slice[idx + 1]);
      } else {
-        /* For convenience only; the caller should ignore this value. */
          return 0;
      }
  }

or if you don't want, keep it explicit

if (!bs->backing || full_discard && (!has_subclusters(s) || !get_l2_bitmap(s, l2_slice, l2_index + i)))


=====

In case QCOW2_CLUSTER_ZERO_PLAIN, worth assert !has_subclusters(s) or mark image corrupted ?

>               break;
> @@ -1817,7 +1825,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>   
>           /* First remove L2 entries */
>           qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> -        if (!full_discard && s->qcow_version >= 3) {
> +        if (has_subclusters(s)) {
> +            set_l2_entry(s, l2_slice, l2_index + i, 0);
> +            set_l2_bitmap(s, l2_slice, l2_index + i,
> +                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
> +        } else if (!full_discard && s->qcow_version >= 3) {
>               set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>           } else {
>               set_l2_entry(s, l2_slice, l2_index + i, 0);
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset()
  2020-04-22  8:07   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-22 11:54     ` Alberto Garcia
  2020-04-22 12:18       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-22 11:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Wed 22 Apr 2020 10:07:30 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> +static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
>> +                                        unsigned sc_index, uint64_t *l2_slice,
>> +                                        int l2_index)
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>
> preexist, but, worth asserting that nb_clusters are all in this
> l2_slice?

Ok.

>> +        for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) {
>> +            if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) {
>> +                goto out;
>
> why not just return count from here? And then you don't need goto at
> all. Hmm, may be out: code will be extended in further patches..

It's not extended in further patches. I generally prefer having a single
exit point but you're right that it probably doesn't make sense here.

>>           /* Compressed clusters can only be processed one by one */
>> -        c = 1;
>> +        sc = s->subclusters_per_cluster - sc_index;
>
> should not we assert here that sc_index == 0? Otherwise the caller
> definitely doing something wrong.

No, no, the guest offset doesn't need to be cluster aligned so sc_index
can perfectly be != 0.

>> +    case QCOW2_SUBCLUSTER_ZERO_ALLOC:
>> +    case QCOW2_SUBCLUSTER_NORMAL:
>> +    case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
>> +        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
>> +                                          l2_slice, l2_index);
>>           *host_offset = l2_entry & L2E_OFFSET_MASK;
>>           if (offset_into_cluster(s, *host_offset)) {
>
> Hmm, you may move "sc = count_contiguous_subclusters" to be after the
> switch-block, as it is universal now. And keep only offset calculation
> and error checking in the switch-block.

That's actually a good idea, thanks !! (plus we actually get to use the
QCOW2_SUBCLUSTER_COMPRESSED check in count_contiguous_subclusters(),
which is currently dead code).

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 21/30] qcow2: Add subcluster support to check_refcounts_l2()
  2020-03-17 18:16 ` [PATCH v4 21/30] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
@ 2020-04-22 12:06   ` Vladimir Sementsov-Ogievskiy
  2020-04-23 15:45     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-22 12:06 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
> image has subclusters. Instead, the individual 'all zeroes' bits must
> be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

Patch itself seems correct.. Still, would be good also to check, is QCOW_OFLAG_ZERO set in subclustres case and add corresponding corruptions++, and may be even fix (by using  QCOW_L2_BITMAP_ALL_ZEROES instead)

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

> ---
>   block/qcow2-refcount.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 3b89a97fd0..9337496c84 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1686,8 +1686,13 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>                           int ign = active ? QCOW2_OL_ACTIVE_L2 :
>                                              QCOW2_OL_INACTIVE_L2;
>   
> -                        l2_entry = QCOW_OFLAG_ZERO;
> -                        set_l2_entry(s, l2_table, i, l2_entry);
> +                        if (has_subclusters(s)) {
> +                            set_l2_entry(s, l2_table, i, 0);
> +                            set_l2_bitmap(s, l2_table, i,
> +                                          QCOW_L2_BITMAP_ALL_ZEROES);
> +                        } else {
> +                            set_l2_entry(s, l2_table, i, QCOW_OFLAG_ZERO);
> +                        }
>                           ret = qcow2_pre_write_overlap_check(bs, ign,
>                                   l2e_offset, l2_entry_size(s), false);
>                           if (ret < 0) {
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset()
  2020-04-22 11:54     ` Alberto Garcia
@ 2020-04-22 12:18       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-22 12:18 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

22.04.2020 14:54, Alberto Garcia wrote:
> On Wed 22 Apr 2020 10:07:30 AM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>> +static int count_contiguous_subclusters(BlockDriverState *bs, int nb_clusters,
>>> +                                        unsigned sc_index, uint64_t *l2_slice,
>>> +                                        int l2_index)
>>>    {
>>>        BDRVQcow2State *s = bs->opaque;
>>
>> preexist, but, worth asserting that nb_clusters are all in this
>> l2_slice?
> 
> Ok.
> 
>>> +        for (j = (i == 0) ? sc_index : 0; j < s->subclusters_per_cluster; j++) {
>>> +            if (qcow2_get_subcluster_type(bs, l2_entry, l2_bitmap, j) != type) {
>>> +                goto out;
>>
>> why not just return count from here? And then you don't need goto at
>> all. Hmm, may be out: code will be extended in further patches..
> 
> It's not extended in further patches. I generally prefer having a single
> exit point but you're right that it probably doesn't make sense here.
> 
>>>            /* Compressed clusters can only be processed one by one */
>>> -        c = 1;
>>> +        sc = s->subclusters_per_cluster - sc_index;
>>
>> should not we assert here that sc_index == 0? Otherwise the caller
>> definitely doing something wrong.
> 
> No, no, the guest offset doesn't need to be cluster aligned so sc_index
> can perfectly be != 0.

Hmm. yes. The only caller actually doesn't call count_contiguous_subclusters for compressed cluster case, but it may be refactored to do so, and then it does

   bytes_available = ((int64_t)sc + sc_index) << s->subcluster_bits;

so, even if intermediate sc is not very meaningful for compressed clusters (as we can't access sub-chunk of compressed cluster in any way), the resulting bytes_available is meaningful and it rely on sc being exactly what it is..

Ok

> 
>>> +    case QCOW2_SUBCLUSTER_ZERO_ALLOC:
>>> +    case QCOW2_SUBCLUSTER_NORMAL:
>>> +    case QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC:
>>> +        sc = count_contiguous_subclusters(bs, nb_clusters, sc_index,
>>> +                                          l2_slice, l2_index);
>>>            *host_offset = l2_entry & L2E_OFFSET_MASK;
>>>            if (offset_into_cluster(s, *host_offset)) {
>>
>> Hmm, you may move "sc = count_contiguous_subclusters" to be after the
>> switch-block, as it is universal now. And keep only offset calculation
>> and error checking in the switch-block.
> 
> That's actually a good idea, thanks !! (plus we actually get to use the
> QCOW2_SUBCLUSTER_COMPRESSED check in count_contiguous_subclusters(),
> which is currently dead code).
> 
> Berto
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 22/30] qcow2: Fix offset calculation in handle_dependencies()
  2020-03-17 18:16 ` [PATCH v4 22/30] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
@ 2020-04-22 12:38   ` Vladimir Sementsov-Ogievskiy
  2020-04-23 15:50     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-22 12:38 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> l2meta_cow_start() and l2meta_cow_end() are not necessarily
> cluster-aligned if the image has subclusters, so update the
> calculation of old_start and old_end to guarantee that no two requests
> try to write on the same cluster.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

Somehow, this patch say me "hey, there may be a lot of other small places, which we forget to fix about subclusters, and you have no idea, how to find and check them all" :) Probably the only way is reviewing the whole qcow2 code, but it's too huge task.. [this is just thinking out loud]

Actually, you call it "Fix", and it seems to be a fix for your "[PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta()". Shouldn't it be squashed in?

> ---
>   block/qcow2-cluster.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 824c710760..ceacd91ea3 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1306,8 +1306,8 @@ static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
>   
>           uint64_t start = guest_offset;
>           uint64_t end = start + bytes;
> -        uint64_t old_start = l2meta_cow_start(old_alloc);
> -        uint64_t old_end = l2meta_cow_end(old_alloc);
> +        uint64_t old_start = start_of_cluster(s, l2meta_cow_start(old_alloc));
> +        uint64_t old_end = ROUND_UP(l2meta_cow_end(old_alloc), s->cluster_size);
>   
>           if (end <= old_start || start >= old_end) {
>               /* No intersection */
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 19/30] qcow2: Add subcluster support to zero_in_l2_slice()
  2020-04-22 11:06   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-22 12:53     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-22 12:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Wed 22 Apr 2020 01:06:42 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> @@ -1897,7 +1897,7 @@ static int zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>
> As I see, function is not prepared to handle unaligned offset. Worth
> add an assertion while being here?

The only caller already asserts that, and the length parameter is not
even the number of bytes but the number of clusters, so I don't think
it's so important in this case.

>>       for (i = 0; i < nb_clusters; i++) {
>> -        uint64_t old_offset;
>> +        uint64_t old_offset, l2_entry = 0;
>
> I'd rename s/old_offset/old_l2_entry

I think we can get rid of old_offset altogether. I'll think of a way to
restructure the logics along the lines that you suggest.

Thanks!

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-04-22 11:35   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-22 17:42     ` Alberto Garcia
  2020-04-22 18:09       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-22 17:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Wed 22 Apr 2020 01:35:25 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> 17.03.2020 21:16, Alberto Garcia wrote:
>> Two changes are needed in this function:
>> 
>> 1) A full discard deallocates a cluster so we can skip the operation if
>>     it is already unallocated. With extended L2 entries however if any
>>     of the subclusters has the 'all zeroes' bit set then we have to
>>     clear it.
>> 
>> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>>     image has extended L2 entries. Instead, the individual 'all zeroes'
>>     bits must be used.
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   block/qcow2-cluster.c | 18 +++++++++++++++---
>>   1 file changed, 15 insertions(+), 3 deletions(-)
>> 
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 746006a117..824c710760 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>            * TODO We might want to use bdrv_block_status(bs) here, but we're
>>            * holding s->lock, so that doesn't work today.
>>            *
>> -         * If full_discard is true, the sector should not read back as zeroes,
>> +         * If full_discard is true, the cluster should not read back as zeroes,
>>            * but rather fall through to the backing file.
>>            */
>>           switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>           case QCOW2_CLUSTER_UNALLOCATED:
>> -            if (full_discard || !bs->backing) {
>> +            if (full_discard) {
>> +                /* If the image has extended L2 entries we can only
>> +                 * skip this operation if the L2 bitmap is zero. */
>> +                uint64_t bitmap = has_subclusters(s) ?
>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>> +                if (bitmap == 0) {
>> +                    continue;
>> +                }
>> +            } else if (!bs->backing) {
>>                   continue;
>>               }
>
> Hmm, so you do continue if full_discard is false AND bitmap != 0 &
> !bs->backing,

> but you do not continue if full_discard is true AND bitmap != 0 &
> !bs->backing (as you will not go to "else") branch.

1. If full_discard is true it means that the entry and the bitmap should
   always be set to 0, regardless of whether there's a backing file or
   any other consideration.

   This is used e.g when shrinking an image, or by qcow2_make_empty().

   We can only skip this operation if both the entry and the bitmap are
   already 0 (the former we know because of QCOW2_CLUSTER_UNALLOCATED).

2. If full_discard is false it means that we must ensure that the
   cluster reads back as zeroes, but there's no need to clear the bitmap
   (in fact we must set QCOW_OFLAG_ZERO or QCOW_L2_BITMAP_ALL_ZEROES
   depending on the type of image).

   We can skip this operation if there's no backing file and the cluster
   is already unallocated (because then we know that it already reads as
   zeroes).

   One optimization would be to skip the operation also if the image has
   subclusters and the bitmap is QCOW_L2_BITMAP_ALL_ZEROES, I can do
   that for the next version.

> In case QCOW2_CLUSTER_ZERO_PLAIN, worth assert !has_subclusters(s) or
> mark image corrupted ?

I think that should be handled directly in qcow2_get_cluster_type().

There's currently an inconsistency now that I think of it: if an image
has subclusters and QCOW_OFLAG_ZERO set then qcow2_get_cluster_type()
returns QCOW2_CLUSTER_ZERO_* but qcow2_get_subcluster_type() returns
QCOW2_SUBCLUSTER_INVALID.

Two alternatives:

  - We add QCOW2_CLUSTER_INVALID so we get an error in both
    cases. Problem: any function that calls qcow2_get_cluster_type()
    should be modified to handle that.

  - We ignore QCOW_OFLAG_ZERO. Simpler, and it would allow us to use
    that bit in the future if we wanted.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-04-22 17:42     ` Alberto Garcia
@ 2020-04-22 18:09       ` Vladimir Sementsov-Ogievskiy
  2020-04-23 14:18         ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-22 18:09 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

22.04.2020 20:42, Alberto Garcia wrote:
> On Wed 22 Apr 2020 01:35:25 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> 17.03.2020 21:16, Alberto Garcia wrote:
>>> Two changes are needed in this function:
>>>
>>> 1) A full discard deallocates a cluster so we can skip the operation if
>>>      it is already unallocated. With extended L2 entries however if any
>>>      of the subclusters has the 'all zeroes' bit set then we have to
>>>      clear it.
>>>
>>> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>>>      image has extended L2 entries. Instead, the individual 'all zeroes'
>>>      bits must be used.
>>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>    block/qcow2-cluster.c | 18 +++++++++++++++---
>>>    1 file changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 746006a117..824c710760 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>>             * TODO We might want to use bdrv_block_status(bs) here, but we're
>>>             * holding s->lock, so that doesn't work today.
>>>             *
>>> -         * If full_discard is true, the sector should not read back as zeroes,
>>> +         * If full_discard is true, the cluster should not read back as zeroes,
>>>             * but rather fall through to the backing file.
>>>             */
>>>            switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>>            case QCOW2_CLUSTER_UNALLOCATED:
>>> -            if (full_discard || !bs->backing) {
>>> +            if (full_discard) {
>>> +                /* If the image has extended L2 entries we can only
>>> +                 * skip this operation if the L2 bitmap is zero. */
>>> +                uint64_t bitmap = has_subclusters(s) ?
>>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>>> +                if (bitmap == 0) {
>>> +                    continue;
>>> +                }
>>> +            } else if (!bs->backing) {
>>>                    continue;
>>>                }
>>
>> Hmm, so you do continue if full_discard is false AND bitmap != 0 &
>> !bs->backing,
> 
>> but you do not continue if full_discard is true AND bitmap != 0 &
>> !bs->backing (as you will not go to "else") branch.
> 
> 1. If full_discard is true it means that the entry and the bitmap should
>     always be set to 0, regardless of whether there's a backing file or
>     any other consideration.
> 
>     This is used e.g when shrinking an image, or by qcow2_make_empty().
> 
>     We can only skip this operation if both the entry and the bitmap are
>     already 0 (the former we know because of QCOW2_CLUSTER_UNALLOCATED).

Ah, understand, sorry. I thought that behavior was changed accidentally, but it is for purpose. With old code cluster is already unallocated, but with subclusters we may have some ZERO_PLAIN subclusters.

> 
> 2. If full_discard is false it means that we must ensure that the
>     cluster reads back as zeroes, but there's no need to clear the bitmap
>     (in fact we must set QCOW_OFLAG_ZERO or QCOW_L2_BITMAP_ALL_ZEROES
>     depending on the type of image).
> 
>     We can skip this operation if there's no backing file and the cluster
>     is already unallocated (because then we know that it already reads as
>     zeroes).
> 
>     One optimization would be to skip the operation also if the image has
>     subclusters and the bitmap is QCOW_L2_BITMAP_ALL_ZEROES, I can do
>     that for the next version.
> 
>> In case QCOW2_CLUSTER_ZERO_PLAIN, worth assert !has_subclusters(s) or
>> mark image corrupted ?
> 
> I think that should be handled directly in qcow2_get_cluster_type().
> 
> There's currently an inconsistency now that I think of it: if an image
> has subclusters and QCOW_OFLAG_ZERO set then qcow2_get_cluster_type()
> returns QCOW2_CLUSTER_ZERO_* but qcow2_get_subcluster_type() returns
> QCOW2_SUBCLUSTER_INVALID.
> 
> Two alternatives:
> 
>    - We add QCOW2_CLUSTER_INVALID so we get an error in both
>      cases. Problem: any function that calls qcow2_get_cluster_type()
>      should be modified to handle that.
> 
>    - We ignore QCOW_OFLAG_ZERO. Simpler, and it would allow us to use
>      that bit in the future if we wanted.
> 

Hmm. Actually we don't check other reserved bits. But ZERO bit is risky, we may miss data corruptions during transmission to the qcow2-subclusters world. So I'm for the first variant if it's not too huge.




-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice()
  2020-04-22 18:09       ` Vladimir Sementsov-Ogievskiy
@ 2020-04-23 14:18         ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-23 14:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Wed 22 Apr 2020 08:09:53 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> There's currently an inconsistency now that I think of it: if an image
>> has subclusters and QCOW_OFLAG_ZERO set then qcow2_get_cluster_type()
>> returns QCOW2_CLUSTER_ZERO_* but qcow2_get_subcluster_type() returns
>> QCOW2_SUBCLUSTER_INVALID.
>> 
>> Two alternatives:
>> 
>>    - We add QCOW2_CLUSTER_INVALID so we get an error in both
>>      cases. Problem: any function that calls qcow2_get_cluster_type()
>>      should be modified to handle that.
>> 
>>    - We ignore QCOW_OFLAG_ZERO. Simpler, and it would allow us to use
>>      that bit in the future if we wanted.
>> 
>
> Hmm. Actually we don't check other reserved bits. But ZERO bit is
> risky, we may miss data corruptions during transmission to the
> qcow2-subclusters world.

That's the best argument for checking that bit.

> So I'm for the first variant if it's not too huge.

The other problem is that if we ever want to use that bit for something
else then we would need to add an incompatible feature. If we just
ignore it now then we may be able to make it a compatible feature. But
the chances for that are low I think, and we still have 8 available bits
in the L2 entry.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 21/30] qcow2: Add subcluster support to check_refcounts_l2()
  2020-04-22 12:06   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-23 15:45     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-23 15:45 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Wed 22 Apr 2020 02:06:56 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> 17.03.2020 21:16, Alberto Garcia wrote:
>> Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>> image has subclusters. Instead, the individual 'all zeroes' bits must
>> be used.
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>
> Patch itself seems correct.. Still, would be good also to check, is
> QCOW_OFLAG_ZERO set in subclustres case and add corresponding
> corruptions++, and may be even fix (by using
> QCOW_L2_BITMAP_ALL_ZEROES instead)

I'll add it to my TODO list for a later patch.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 22/30] qcow2: Fix offset calculation in handle_dependencies()
  2020-04-22 12:38   ` Vladimir Sementsov-Ogievskiy
@ 2020-04-23 15:50     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-23 15:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

On Wed 22 Apr 2020 02:38:54 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> 17.03.2020 21:16, Alberto Garcia wrote:
>> l2meta_cow_start() and l2meta_cow_end() are not necessarily
>> cluster-aligned if the image has subclusters, so update the
>> calculation of old_start and old_end to guarantee that no two requests
>> try to write on the same cluster.
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>
> Somehow, this patch say me "hey, there may be a lot of other small
> places, which we forget to fix about subclusters, and you have no
> idea, how to find and check them all" :) Probably the only way is
> reviewing the whole qcow2 code, but it's too huge task.. [this is just
> thinking out loud]

:-)

> Actually, you call it "Fix", and it seems to be a fix for your "[PATCH
> v4 17/30] qcow2: Add subcluster support to
> calculate_l2_meta()". Shouldn't it be squashed in?

Maybe it't not a bad idea... I'll have a look.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-03-17 18:16 ` [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
@ 2020-04-24 17:02   ` Alberto Garcia
  2020-04-24 17:11     ` Eric Blake
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-24 17:02 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia <berto@igalia.com> wrote:
> Compressed clusters always have the bitmap part of the extended L2
> entry set to 0.

I was just finishing some improvements to the new code that allows
BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to
entertain the idea of using the L2 bitmap for compressed clusters as
well.

I will make some tests next week, but I would like to know your opinion
in case I'm missing something.

A compressed cluster cannot be divided into subclusters on the image:
you would not be able to allocate or overwrite them separately,
therefore any write request necessarily has to write (or do COW of) the
whole cluster.

However if you consider the uncompressed guest data I don't see any
reason why you wouldn't be able to zeroize or even deallocate individual
subclusters. These operations don't touch the cluster data on disk
anyway, they only touch the L2 metadata in order to change what the
guest sees.

'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
copy on write. The compressed data would remain untouched on disk but
some of the subclusters would have the 'all zeroes' bit set, exactly
like what happens with normal clusters.

I think that this would make the on-disk format a bit simpler in general
(no need to treat compressed clusters differently in some cases) and it
would add a new optimization to compressed images. I just need to make
sure that it doesn't complicate the code (my feeling is that it would
actually simplify it, but I have to see).

Opinions?

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 17:02   ` Alberto Garcia
@ 2020-04-24 17:11     ` Eric Blake
  2020-04-24 17:21       ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Eric Blake @ 2020-04-24 17:11 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 4/24/20 12:02 PM, Alberto Garcia wrote:
> On Tue 17 Mar 2020 07:16:21 PM CET, Alberto Garcia <berto@igalia.com> wrote:
>> Compressed clusters always have the bitmap part of the extended L2
>> entry set to 0.
> 
> I was just finishing some improvements to the new code that allows
> BDRV_REQ_ZERO_WRITE at the subcluster level, and I'm starting to
> entertain the idea of using the L2 bitmap for compressed clusters as
> well.
> 
> I will make some tests next week, but I would like to know your opinion
> in case I'm missing something.
> 
> A compressed cluster cannot be divided into subclusters on the image:
> you would not be able to allocate or overwrite them separately,
> therefore any write request necessarily has to write (or do COW of) the
> whole cluster.
> 
> However if you consider the uncompressed guest data I don't see any
> reason why you wouldn't be able to zeroize or even deallocate individual
> subclusters. These operations don't touch the cluster data on disk
> anyway, they only touch the L2 metadata in order to change what the
> guest sees.
> 
> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
> copy on write. The compressed data would remain untouched on disk but
> some of the subclusters would have the 'all zeroes' bit set, exactly
> like what happens with normal clusters.

It's a special case that avoids COW for write zeroes, but not for 
anything else. The moment you write any data (whether to the 
zero-above-compressed or the regular compressed portion), the entire 
cluster has to be rewritten.  I'm not sure how frequently guests will 
actually have the scenario of doing a zero request on a sub-cluster, but 
at the same time, I can see where you're coming from in stating that if 
it makes management of extended L2 easier to allow zero subclusters on 
top of a compressed cluster, then there's no reason to forbid it.

> 
> I think that this would make the on-disk format a bit simpler in general
> (no need to treat compressed clusters differently in some cases) and it
> would add a new optimization to compressed images. I just need to make
> sure that it doesn't complicate the code (my feeling is that it would
> actually simplify it, but I have to see).
> 
> Opinions?
> 
> Berto
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 17:11     ` Eric Blake
@ 2020-04-24 17:21       ` Alberto Garcia
  2020-04-24 17:44         ` Eric Blake
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-24 17:21 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake <eblake@redhat.com> wrote:
>> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
>> copy on write. The compressed data would remain untouched on disk but
>> some of the subclusters would have the 'all zeroes' bit set, exactly
>> like what happens with normal clusters.
>
> It's a special case that avoids COW for write zeroes, but not for
> anything else. The moment you write any data (whether to the
> zero-above-compressed or the regular compressed portion), the entire
> cluster has to be rewritten.

That's right but you can still write zeroes without having to rewrite
anything, and read back the zeroes without having to decompress the
data.

> at the same time, I can see where you're coming from in stating that
> if it makes management of extended L2 easier to allow zero subclusters
> on top of a compressed cluster, then there's no reason to forbid it.

I'm not sure if it makes it easier. Some operations are definitely going
to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
subclusters). Or maybe replace QCow2SubclusterType with something
else. I need to evaluate that.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 17:21       ` Alberto Garcia
@ 2020-04-24 17:44         ` Eric Blake
  2020-04-24 17:56           ` Alberto Garcia
  2020-04-24 18:15           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 128+ messages in thread
From: Eric Blake @ 2020-04-24 17:44 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 4/24/20 12:21 PM, Alberto Garcia wrote:
> On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake <eblake@redhat.com> wrote:
>>> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
>>> copy on write. The compressed data would remain untouched on disk but
>>> some of the subclusters would have the 'all zeroes' bit set, exactly
>>> like what happens with normal clusters.
>>
>> It's a special case that avoids COW for write zeroes, but not for
>> anything else. The moment you write any data (whether to the
>> zero-above-compressed or the regular compressed portion), the entire
>> cluster has to be rewritten.
> 
> That's right but you can still write zeroes without having to rewrite
> anything, and read back the zeroes without having to decompress the
> data.
> 
>> at the same time, I can see where you're coming from in stating that
>> if it makes management of extended L2 easier to allow zero subclusters
>> on top of a compressed cluster, then there's no reason to forbid it.
> 
> I'm not sure if it makes it easier. Some operations are definitely going
> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
> subclusters). Or maybe replace QCow2SubclusterType with something
> else. I need to evaluate that.

Reading the entire cluster will be interesting - you'll have to 
decompress the entire memory, then overwrite the zeroed portions.  The 
savings in reading occur only when your read is limited to just the 
subclusters that are zeroed.

But then again, even on a regular cluster, read has to pay attention to 
which subclusters are zeroed, so you already have the workhorse in read 
for detecting whether a normal read is sufficient or if you have to 
follow up with piecing together zeroed sections.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 17:44         ` Eric Blake
@ 2020-04-24 17:56           ` Alberto Garcia
  2020-04-24 18:25             ` Vladimir Sementsov-Ogievskiy
  2020-04-24 18:15           ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-24 17:56 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake <eblake@redhat.com> wrote:
>>> at the same time, I can see where you're coming from in stating that
>>> if it makes management of extended L2 easier to allow zero subclusters
>>> on top of a compressed cluster, then there's no reason to forbid it.
>> 
>> I'm not sure if it makes it easier. Some operations are definitely going
>> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
>> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
>> subclusters). Or maybe replace QCow2SubclusterType with something
>> else. I need to evaluate that.
>
> Reading the entire cluster will be interesting - you'll have to
> decompress the entire memory, then overwrite the zeroed portions.

I don't think so, qcow2_get_host_offset() would detect the number of
contiguous subclusters of the same type at the given offset. In this
case they would be _ZERO subclusters so there's no need to decompress
anything, or even read it (it works the same with uncompressed
clusters).

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 17:44         ` Eric Blake
  2020-04-24 17:56           ` Alberto Garcia
@ 2020-04-24 18:15           ` Vladimir Sementsov-Ogievskiy
  2020-04-24 18:41             ` Alberto Garcia
  1 sibling, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24 18:15 UTC (permalink / raw)
  To: Eric Blake, Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

24.04.2020 20:44, Eric Blake wrote:
> On 4/24/20 12:21 PM, Alberto Garcia wrote:
>> On Fri 24 Apr 2020 07:11:08 PM CEST, Eric Blake <eblake@redhat.com> wrote:
>>>> 'write -c 0 64k' followed by 'write -z 16k 16k' would not need to do any
>>>> copy on write. The compressed data would remain untouched on disk but
>>>> some of the subclusters would have the 'all zeroes' bit set, exactly
>>>> like what happens with normal clusters.
>>>
>>> It's a special case that avoids COW for write zeroes, but not for
>>> anything else. The moment you write any data (whether to the
>>> zero-above-compressed or the regular compressed portion), the entire
>>> cluster has to be rewritten.
>>
>> That's right but you can still write zeroes without having to rewrite
>> anything, and read back the zeroes without having to decompress the
>> data.
>>
>>> at the same time, I can see where you're coming from in stating that
>>> if it makes management of extended L2 easier to allow zero subclusters
>>> on top of a compressed cluster, then there's no reason to forbid it.
>>
>> I'm not sure if it makes it easier. Some operations are definitely going
>> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
>> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
>> subclusters). Or maybe replace QCow2SubclusterType with something
>> else. I need to evaluate that.
> 
> Reading the entire cluster will be interesting - you'll have to decompress the entire memory, then overwrite the zeroed portions.  The savings in reading occur only when your read is limited to just the subclusters that are zeroed.
> 
> But then again, even on a regular cluster, read has to pay attention to which subclusters are zeroed, so you already have the workhorse in read for detecting whether a normal read is sufficient or if you have to follow up with piecing together zeroed sections.
> 

AFAIK, now compressed clusters can't be used in scenarios with guest, as qcow2 driver doesn't support rewriting them. Or am I wrong? And we normally don't combine normal and compressed clusters together in one image. So, currently, the usual use-case of compressed clusters is a fully compressed image, written once.

It means, that with current specification, subclusters adds nothing to this case, and no reason to create compressed image with subclusters. And even if we allow zero/unallocated subclusters, seems it adds nothing to this use-case.

So, I don't see real benefits of it for now, but neither any problems with it, so agree that it's mostly about which way is simpler..

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 17:56           ` Alberto Garcia
@ 2020-04-24 18:25             ` Vladimir Sementsov-Ogievskiy
  2020-04-24 18:37               ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-24 18:25 UTC (permalink / raw)
  To: Alberto Garcia, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

24.04.2020 20:56, Alberto Garcia wrote:
> On Fri 24 Apr 2020 07:44:33 PM CEST, Eric Blake <eblake@redhat.com> wrote:
>>>> at the same time, I can see where you're coming from in stating that
>>>> if it makes management of extended L2 easier to allow zero subclusters
>>>> on top of a compressed cluster, then there's no reason to forbid it.
>>>
>>> I'm not sure if it makes it easier. Some operations are definitely going
>>> to be easier but maybe we have to add and handle _ZERO_COMPRESSED in
>>> addition to _ZERO_PLAIN and _ZERO_ALLOC (the same for unallocated
>>> subclusters). Or maybe replace QCow2SubclusterType with something
>>> else. I need to evaluate that.

Reviewing your series it already came in my mind, that we are doing too much with the conversion from l2e flags to "type". Does it worth it? All these ZERO_PLAIN and UNALLOCATED_ALLOC, and "case <TYPE>:" lines combined by three-four into one case, do they help, or is it an extra work? We just have to maintain two views of one model.. But I don't suggest to refactor it in these series :)

>>
>> Reading the entire cluster will be interesting - you'll have to
>> decompress the entire memory, then overwrite the zeroed portions.
> 
> I don't think so, qcow2_get_host_offset() would detect the number of
> contiguous subclusters of the same type at the given offset. In this
> case they would be _ZERO subclusters so there's no need to decompress
> anything, or even read it (it works the same with uncompressed
> clusters).
> 

But if at least one of subclusters to read is not _ZERO, you'll have to decompress the whole cluster, and after decompression rewrite zero-subclusters by zeroes, as Eric says.. Or I lost the thread:)


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 18:25             ` Vladimir Sementsov-Ogievskiy
@ 2020-04-24 18:37               ` Alberto Garcia
  2020-04-24 18:47                 ` Eric Blake
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-24 18:37 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>> Reading the entire cluster will be interesting - you'll have to
>>> decompress the entire memory, then overwrite the zeroed portions.
>> 
>> I don't think so, qcow2_get_host_offset() would detect the number of
>> contiguous subclusters of the same type at the given offset. In this
>> case they would be _ZERO subclusters so there's no need to decompress
>> anything, or even read it (it works the same with uncompressed
>> clusters).
>
> But if at least one of subclusters to read is not _ZERO, you'll have
> to decompress the whole cluster, and after decompression rewrite
> zero-subclusters by zeroes, as Eric says.. Or I lost the thread:)

I don't see why you would need to rewrite anything... you do have to
decompress the whole cluster, and the uncompressed cluster in memory
would have stale data, but you never need to use that data for anything,
let alone to return it to the guest.

Even if there's a COW, the new cluster would inherit the compressed
cluster's bitmap so the zeroized subclusters still read as zeroes.

It's the same with normal clusters, 'write -P 0xff 0 64k' followed by
'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the
L2 entry indicates that part of it is just zeroes.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 18:15           ` Vladimir Sementsov-Ogievskiy
@ 2020-04-24 18:41             ` Alberto Garcia
  2020-04-25  6:38               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 128+ messages in thread
From: Alberto Garcia @ 2020-04-24 18:41 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On Fri 24 Apr 2020 08:15:04 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
> AFAIK, now compressed clusters can't be used in scenarios with guest,
> as qcow2 driver doesn't support rewriting them.

You can write to those images just fine, it's just not efficient because
you have to COW the compressed clusters.

> Or am I wrong? And we normally don't combine normal and compressed
> clusters together in one image.

As soon as you start writing to an image with compressed clusters you'll
have a combination of both.

But it's true that you don't have an image with compressed clusters if
what you're looking for is performance. So I wouldn't add support for
this if it complicates things too much.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 18:37               ` Alberto Garcia
@ 2020-04-24 18:47                 ` Eric Blake
  2020-04-27  7:49                   ` Max Reitz
  0 siblings, 1 reply; 128+ messages in thread
From: Eric Blake @ 2020-04-24 18:47 UTC (permalink / raw)
  To: Alberto Garcia, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

On 4/24/20 1:37 PM, Alberto Garcia wrote:
> On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>>> Reading the entire cluster will be interesting - you'll have to
>>>> decompress the entire memory, then overwrite the zeroed portions.
>>>
>>> I don't think so, qcow2_get_host_offset() would detect the number of
>>> contiguous subclusters of the same type at the given offset. In this
>>> case they would be _ZERO subclusters so there's no need to decompress
>>> anything, or even read it (it works the same with uncompressed
>>> clusters).
>>
>> But if at least one of subclusters to read is not _ZERO, you'll have
>> to decompress the whole cluster, and after decompression rewrite
>> zero-subclusters by zeroes, as Eric says.. Or I lost the thread:)
> 
> I don't see why you would need to rewrite anything... you do have to
> decompress the whole cluster, and the uncompressed cluster in memory
> would have stale data, but you never need to use that data for anything,
> let alone to return it to the guest.
> 
> Even if there's a COW, the new cluster would inherit the compressed
> cluster's bitmap so the zeroized subclusters still read as zeroes.
> 
> It's the same with normal clusters, 'write -P 0xff 0 64k' followed by
> 'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the
> L2 entry indicates that part of it is just zeroes.

The point is this:  Consider 'write -P 0xff 0 64k', then 'write -z 16k 
16k', then 'read 0 64k'.  For normal clusters, we can just do a 
scatter-gather iov read of read 0-16k and 32-64k, plus a memset of 
16-32k.  But for compressed clusters, we have to read and decompress the 
entire 64k, AND also memset 16k-32k.  But if zeroing after reading is 
not that expensive, then the same technique for normal clusters is fine 
(instead of a scatter-gather read of 48k, just read the whole 64k 
cluster before doing the memset).  So the question at hand is not what 
happens in writing, but in reading, and whether we are penalizing reads 
from a compressed cluster or even from regular clusters, when reading 
from a cluster where subclusters have different status.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2020-03-17 18:16 ` [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
@ 2020-04-24 19:39   ` Eric Blake
  2020-04-27 13:17     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Eric Blake @ 2020-04-24 19:39 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On 3/17/20 1:16 PM, Alberto Garcia wrote:
> The L2 bitmap needs to be updated after each write to indicate what
> new subclusters are now allocated.
> 
> This needs to happen even if the cluster was already allocated and the
> L2 entry was otherwise valid.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>   block/qcow2-cluster.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index ceacd91ea3..dfd8b66958 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1006,6 +1006,23 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>           assert((offset & L2E_OFFSET_MASK) == offset);
>   
>           set_l2_entry(s, l2_slice, l2_index + i, offset | QCOW_OFLAG_COPIED);
> +
> +        /* Update bitmap with the subclusters that were just written */
> +        if (has_subclusters(s)) {
> +            unsigned written_from = m->cow_start.offset;
> +            unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
> +                m->nb_clusters << s->cluster_bits;
> +            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
> +            int sc;
> +            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
> +                int sc_off = i * s->cluster_size + sc * s->subcluster_size;
> +                if (sc_off >= written_from && sc_off < written_to) {
> +                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
> +                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
> +                }
> +            }

Are there more efficient ways to set this series of bits than iterating 
one bit at a time, while still remaining legible?  For example, what if 
we had something like:

l2_bitmap = get_l2_bitmap(...);
int sc_from = OFFSET_TO_SC(written_from);
int sc_to = OFFSET_TO_SC(written_to - 1);
l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(sc_from, sc_to);
l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(sc_from, sc_to);

which would require macros:

#define OFFSET_TO_SC(offset) (offset >> (s->cluster_bits - 6))
#define QCOW_OFLAG_SUB_ALLOC_RANGE(from, to) \
   deposit64(0, (from), (len) - (from), -1)
#define QCOW_OFLAG_SUB_ZERO_RANGE(from, to) \
   deposit64(0, (from) + 32, (len) - (from) + 32, -1)


> +            set_l2_bitmap(s, l2_slice, l2_index + i, l2_bitmap);

I'm hoping this function doesn't cause redundant I/O if the L2 entry 
didn't actually change.  But that's not the concern for this patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 18:41             ` Alberto Garcia
@ 2020-04-25  6:38               ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-25  6:38 UTC (permalink / raw)
  To: Alberto Garcia, Eric Blake, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block, Max Reitz

24.04.2020 21:41, Alberto Garcia wrote:
> On Fri 24 Apr 2020 08:15:04 PM CEST, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>> AFAIK, now compressed clusters can't be used in scenarios with guest,
>> as qcow2 driver doesn't support rewriting them.
> 
> You can write to those images just fine, it's just not efficient because
> you have to COW the compressed clusters.

No, rewriting doesn't work:

[root@kvm master]# ./qemu-img create -f qcow2 x 10M
Formatting 'x', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16
[root@kvm master]# ./qemu-io -c 'write -c 0 64K' x
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 00.23 sec (278.708 KiB/sec and 4.3548 ops/sec)
[root@kvm master]# ./qemu-io -c 'write -c 0 64K' x
write failed: Input/output error


> 
>> Or am I wrong? And we normally don't combine normal and compressed
>> clusters together in one image.
> 
> As soon as you start writing to an image with compressed clusters you'll
> have a combination of both.

Ah, you mean, rewriting compressed clusters by normal.. So the use-case is just take compressed backup image and use it for the guest, instead of converting first.. It makes sense.

> 
> But it's true that you don't have an image with compressed clusters if
> what you're looking for is performance. So I wouldn't add support for
> this if it complicates things too much.
> 
> Berto
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-24 18:47                 ` Eric Blake
@ 2020-04-27  7:49                   ` Max Reitz
  2020-04-27 18:12                     ` Alberto Garcia
  0 siblings, 1 reply; 128+ messages in thread
From: Max Reitz @ 2020-04-27  7:49 UTC (permalink / raw)
  To: Eric Blake, Alberto Garcia, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block


[-- Attachment #1.1: Type: text/plain, Size: 3321 bytes --]

On 24.04.20 20:47, Eric Blake wrote:
> On 4/24/20 1:37 PM, Alberto Garcia wrote:
>> On Fri 24 Apr 2020 08:25:45 PM CEST, Vladimir Sementsov-Ogievskiy
>> <vsementsov@virtuozzo.com> wrote:
>>>>> Reading the entire cluster will be interesting - you'll have to
>>>>> decompress the entire memory, then overwrite the zeroed portions.
>>>>
>>>> I don't think so, qcow2_get_host_offset() would detect the number of
>>>> contiguous subclusters of the same type at the given offset. In this
>>>> case they would be _ZERO subclusters so there's no need to decompress
>>>> anything, or even read it (it works the same with uncompressed
>>>> clusters).
>>>
>>> But if at least one of subclusters to read is not _ZERO, you'll have
>>> to decompress the whole cluster, and after decompression rewrite
>>> zero-subclusters by zeroes, as Eric says.. Or I lost the thread:)
>>
>> I don't see why you would need to rewrite anything... you do have to
>> decompress the whole cluster, and the uncompressed cluster in memory
>> would have stale data, but you never need to use that data for anything,
>> let alone to return it to the guest.
>>
>> Even if there's a COW, the new cluster would inherit the compressed
>> cluster's bitmap so the zeroized subclusters still read as zeroes.
>>
>> It's the same with normal clusters, 'write -P 0xff 0 64k' followed by
>> 'write -z 16k 16k'. The host cluster on disk still reads as 0xff but the
>> L2 entry indicates that part of it is just zeroes.
> 
> The point is this:  Consider 'write -P 0xff 0 64k', then 'write -z 16k
> 16k', then 'read 0 64k'.  For normal clusters, we can just do a
> scatter-gather iov read of read 0-16k and 32-64k, plus a memset of
> 16-32k.  But for compressed clusters, we have to read and decompress the
> entire 64k, AND also memset 16k-32k.  But if zeroing after reading is
> not that expensive, then the same technique for normal clusters is fine
> (instead of a scatter-gather read of 48k, just read the whole 64k
> cluster before doing the memset).

It would also mean letting qcow2_co_preadv_part() special-handle such
cases, i.e., whenever the whole clusters is compressed, it needs to read
it as a whole, regardless of the subcluster status, and then memset()
all areas to zero that are all-zero subclusters.  Otherwise we’d read
and decompress the whole buffer twice (once for 0 to 16k, once for 32k
to 64k).

This may be complicated a bit by the task schema, i.e. that reads are
scheduled in the task pool.  For qcow2_co_preadv_part() to memset some
area after decompression, it would need to wait on the read_compressed
task, which would make the whole task pool thing moot (for compressed
clusters).  Or it just does the memset() at the end, when we have to
settle the task pool anyway, but then it would have to remember all
areas it still needs to zero.

Hm, or, qcow2_co_preadv_compresed() could figure out where the zeroed
subclusters are and then memset() them itself, e.g. by receiving the
subcluster bitmap.  Probably the simplest implementation, but it seems a
bit like a layering breach.

Not sure how bad the complexity is on the write side for not letting
zero writes just zero the subcluster, but it doesn’t seem to me that the
opposite would come for free on the read side.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 25/30] qcow2: Add subcluster support to handle_alloc_space()
  2020-03-17 18:16 ` [PATCH v4 25/30] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
@ 2020-04-27 11:54   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-27 11:54 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> The bdrv_co_pwrite_zeroes() call here fills complete clusters with
> zeroes, but it can happen that some subclusters are not part of the
> write request or the copy-on-write. This patch makes sure that only
> the affected subclusters are overwritten.
> 
> A potential improvement would be to also fill with zeroes the other
> subclusters if we can guarantee that we are not overwriting existing
> data. However this would waste more disk space, so we should first
> evaluate if it's really worth doing.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 26/30] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only
  2020-03-17 18:16 ` [PATCH v4 26/30] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
@ 2020-04-27 11:59   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 128+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-04-27 11:59 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz, Denis V . Lunev

17.03.2020 21:16, Alberto Garcia wrote:
> Ideally it should be possible to zero individual subclusters using
> this function, but this is currently not implemented.
> 
> Signed-off-by: Alberto Garcia<berto@igalia.com>
> Reviewed-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2()
  2020-04-24 19:39   ` Eric Blake
@ 2020-04-27 13:17     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-27 13:17 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Anton Nefedov, qemu-block, Max Reitz,
	Vladimir Sementsov-Ogievskiy, Denis V . Lunev

On Fri 24 Apr 2020 09:39:25 PM CEST, Eric Blake wrote:
>> +        /* Update bitmap with the subclusters that were just written */
>> +        if (has_subclusters(s)) {
>> +            unsigned written_from = m->cow_start.offset;
>> +            unsigned written_to = m->cow_end.offset + m->cow_end.nb_bytes ?:
>> +                m->nb_clusters << s->cluster_bits;
>> +            uint64_t l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index + i);
>> +            int sc;
>> +            for (sc = 0; sc < s->subclusters_per_cluster; sc++) {
>> +                int sc_off = i * s->cluster_size + sc * s->subcluster_size;
>> +                if (sc_off >= written_from && sc_off < written_to) {
>> +                    l2_bitmap |= QCOW_OFLAG_SUB_ALLOC(sc);
>> +                    l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO(sc);
>> +                }
>> +            }
>
> Are there more efficient ways to set this series of bits than iterating 
> one bit at a time, while still remaining legible?  For example, what if 
> we had something like:
>
> l2_bitmap = get_l2_bitmap(...);
> int sc_from = OFFSET_TO_SC(written_from);
> int sc_to = OFFSET_TO_SC(written_to - 1);
> l2_bitmap |= QCOW_OFLAG_SUB_ALLOC_RANGE(sc_from, sc_to);
> l2_bitmap &= ~QCOW_OFLAG_SUB_ZERO_RANGE(sc_from, sc_to);

That's a very good suggestion, thanks!

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster
  2020-04-27  7:49                   ` Max Reitz
@ 2020-04-27 18:12                     ` Alberto Garcia
  0 siblings, 0 replies; 128+ messages in thread
From: Alberto Garcia @ 2020-04-27 18:12 UTC (permalink / raw)
  To: Max Reitz, Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel
  Cc: Kevin Wolf, Denis V . Lunev, Anton Nefedov, qemu-block

On Mon 27 Apr 2020 09:49:00 AM CEST, Max Reitz wrote:
>> The point is this: Consider 'write -P 0xff 0 64k', then 'write -z 16k
>> 16k', then 'read 0 64k'. For normal clusters, we can just do a
>> scatter-gather iov read of read 0-16k and 32-64k, plus a memset of
>> 16-32k. But for compressed clusters, we have to read and decompress
>> the entire 64k, AND also memset 16k-32k. But if zeroing after reading
>> is not that expensive, then the same technique for normal clusters is
>> fine (instead of a scatter-gather read of 48k, just read the whole
>> 64k cluster before doing the memset).
>
> It would also mean letting qcow2_co_preadv_part() special-handle such
> cases, i.e., whenever the whole clusters is compressed, it needs to
> read it as a whole, regardless of the subcluster status, and then
> memset() all areas to zero that are all-zero subclusters.  Otherwise
> we’d read and decompress the whole buffer twice (once for 0 to 16k,
> once for 32k to 64k).

This is actually a good reason against adding subcluster allocation to
compressed clusters.

I wouldn't like to complicate the code for this use case, so we either
don't support it at all, or we support it with the problem that you
mention (decompressing the whole buffer more than once if the cluster
contains holes).

> Not sure how bad the complexity is on the write side for not letting
> zero writes just zero the subcluster

It is not bad, I just have to check the cluster type and return
-ENOTSUP.

Berto


^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2020-04-27 18:13 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-17 18:15 [PATCH v4 00/30] Add subcluster allocation to qcow2 Alberto Garcia
2020-03-17 18:15 ` [PATCH v4 01/30] qcow2: Make Qcow2AioTask store the full host offset Alberto Garcia
2020-03-18 11:23   ` Eric Blake
2020-04-08 10:23   ` Max Reitz
2020-04-09  6:49   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:15 ` [PATCH v4 02/30] qcow2: Convert qcow2_get_cluster_offset() into qcow2_get_host_offset() Alberto Garcia
2020-03-18 12:08   ` Eric Blake
2020-04-08 10:51   ` Max Reitz
2020-04-08 17:29     ` Alberto Garcia
2020-04-09  7:57     ` Vladimir Sementsov-Ogievskiy
2020-04-09 14:35       ` Alberto Garcia
2020-04-09  7:50   ` Vladimir Sementsov-Ogievskiy
2020-04-09 14:45     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 03/30] qcow2: Add calculate_l2_meta() Alberto Garcia
2020-04-09  8:30   ` Vladimir Sementsov-Ogievskiy
2020-04-09 15:12     ` Alberto Garcia
2020-04-09 18:47       ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 04/30] qcow2: Split cluster_needs_cow() out of count_cow_clusters() Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 05/30] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied() Alberto Garcia
2020-04-09 10:59   ` Vladimir Sementsov-Ogievskiy
2020-04-09 16:08     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 06/30] qcow2: Add get_l2_entry() and set_l2_entry() Alberto Garcia
2020-04-10  8:48   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 07/30] qcow2: Document the Extended L2 Entries feature Alberto Garcia
2020-04-08 11:09   ` Max Reitz
2020-04-09 15:12   ` Eric Blake
2020-04-10  9:29     ` Vladimir Sementsov-Ogievskiy
2020-04-14 14:50       ` Alberto Garcia
2020-04-14 16:19         ` Vladimir Sementsov-Ogievskiy
2020-04-14 16:30           ` Alberto Garcia
2020-04-14 18:06             ` Vladimir Sementsov-Ogievskiy
2020-04-14 18:13               ` Alberto Garcia
2020-04-15 19:11       ` Alberto Garcia
2020-04-15 21:13         ` Eric Blake
2020-04-10 12:01     ` Alberto Garcia
2020-04-14 18:16     ` Alberto Garcia
2020-04-14 18:23       ` Eric Blake
2020-04-14 18:25         ` Eric Blake
2020-03-17 18:16 ` [PATCH v4 08/30] qcow2: Add dummy has_subclusters() function Alberto Garcia
2020-04-10  9:11   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 09/30] qcow2: Add subcluster-related fields to BDRVQcow2State Alberto Garcia
2020-04-08 11:12   ` Max Reitz
2020-04-10  9:45   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 10/30] qcow2: Add offset_to_sc_index() Alberto Garcia
2020-04-13 11:02   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 11/30] qcow2: Add l2_entry_size() Alberto Garcia
2020-04-14  9:44   ` Vladimir Sementsov-Ogievskiy
2020-04-14 12:20     ` Alberto Garcia
2020-04-14 12:29       ` Vladimir Sementsov-Ogievskiy
2020-04-14 12:33         ` Alberto Garcia
2020-04-14 12:39           ` Vladimir Sementsov-Ogievskiy
2020-04-14 16:01       ` Eric Blake
2020-04-14 16:16         ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 12/30] qcow2: Update get/set_l2_entry() and add get/set_l2_bitmap() Alberto Garcia
2020-04-14  9:49   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 13/30] qcow2: Add QCow2SubclusterType and qcow2_get_subcluster_type() Alberto Garcia
2020-04-08 11:23   ` Max Reitz
2020-04-08 17:46     ` Alberto Garcia
2020-04-09  8:22       ` Max Reitz
2020-04-14 11:10   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 14/30] qcow2: Add cluster type parameter to qcow2_get_host_offset() Alberto Garcia
2020-04-08 12:15   ` Max Reitz
2020-04-14 12:30   ` Vladimir Sementsov-Ogievskiy
2020-04-14 12:38     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 15/30] qcow2: Replace QCOW2_CLUSTER_* with QCOW2_SUBCLUSTER_* Alberto Garcia
2020-04-08 12:42   ` Max Reitz
2020-04-15  7:10   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 16/30] qcow2: Handle QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC Alberto Garcia
2020-04-15  7:28   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 17/30] qcow2: Add subcluster support to calculate_l2_meta() Alberto Garcia
2020-04-15  8:39   ` Vladimir Sementsov-Ogievskiy
2020-04-16 20:01     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 18/30] qcow2: Add subcluster support to qcow2_get_host_offset() Alberto Garcia
2020-04-08 12:49   ` Max Reitz
2020-04-08 17:35     ` Alberto Garcia
2020-04-22  8:07   ` Vladimir Sementsov-Ogievskiy
2020-04-22 11:54     ` Alberto Garcia
2020-04-22 12:18       ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 19/30] qcow2: Add subcluster support to zero_in_l2_slice() Alberto Garcia
2020-04-22 11:06   ` Vladimir Sementsov-Ogievskiy
2020-04-22 12:53     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 20/30] qcow2: Add subcluster support to discard_in_l2_slice() Alberto Garcia
2020-04-09 10:05   ` Max Reitz
2020-04-10 12:47     ` Alberto Garcia
2020-04-14 10:13       ` Max Reitz
2020-04-22 11:35   ` Vladimir Sementsov-Ogievskiy
2020-04-22 17:42     ` Alberto Garcia
2020-04-22 18:09       ` Vladimir Sementsov-Ogievskiy
2020-04-23 14:18         ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 21/30] qcow2: Add subcluster support to check_refcounts_l2() Alberto Garcia
2020-04-22 12:06   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:45     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 22/30] qcow2: Fix offset calculation in handle_dependencies() Alberto Garcia
2020-04-22 12:38   ` Vladimir Sementsov-Ogievskiy
2020-04-23 15:50     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 23/30] qcow2: Update L2 bitmap in qcow2_alloc_cluster_link_l2() Alberto Garcia
2020-04-24 19:39   ` Eric Blake
2020-04-27 13:17     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 24/30] qcow2: Clear the L2 bitmap when allocating a compressed cluster Alberto Garcia
2020-04-24 17:02   ` Alberto Garcia
2020-04-24 17:11     ` Eric Blake
2020-04-24 17:21       ` Alberto Garcia
2020-04-24 17:44         ` Eric Blake
2020-04-24 17:56           ` Alberto Garcia
2020-04-24 18:25             ` Vladimir Sementsov-Ogievskiy
2020-04-24 18:37               ` Alberto Garcia
2020-04-24 18:47                 ` Eric Blake
2020-04-27  7:49                   ` Max Reitz
2020-04-27 18:12                     ` Alberto Garcia
2020-04-24 18:15           ` Vladimir Sementsov-Ogievskiy
2020-04-24 18:41             ` Alberto Garcia
2020-04-25  6:38               ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 25/30] qcow2: Add subcluster support to handle_alloc_space() Alberto Garcia
2020-04-27 11:54   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 26/30] qcow2: Restrict qcow2_co_pwrite_zeroes() to full clusters only Alberto Garcia
2020-04-27 11:59   ` Vladimir Sementsov-Ogievskiy
2020-03-17 18:16 ` [PATCH v4 27/30] qcow2: Assert that expand_zero_clusters_in_l1() does not support subclusters Alberto Garcia
2020-04-09 10:27   ` Max Reitz
2020-04-10 16:42     ` Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 28/30] qcow2: Add the 'extended_l2' option and the QCOW2_INCOMPAT_EXTL2 bit Alberto Garcia
2020-04-09 14:49   ` Eric Blake
2020-03-17 18:16 ` [PATCH v4 29/30] qcow2: Add subcluster support to qcow2_measure() Alberto Garcia
2020-03-17 18:16 ` [PATCH v4 30/30] iotests: Add tests for qcow2 images with extended L2 entries Alberto Garcia
2020-04-09 12:22   ` Max Reitz
2020-04-13 17:16     ` Alberto Garcia
2020-04-14 10:14       ` Max Reitz
2020-04-21  5:06 ` [PATCH v4 00/30] Add subcluster allocation to qcow2 Derek Su
2020-04-21 10:35   ` Alberto Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.