All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests]
@ 2017-05-04  3:07 Eric Blake
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 01/10] qcow2: Use consistent switch indentation Eric Blake
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

I've collected several improvements for qcow2 zero-cluster handling.

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v12

Marked as v12 for "hysterical raisins", since it it the half of
v10 [1] that was not resubmitted as v11 [2].

Depends on Max's block tree:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00641.html
and on Max's qcow2 cleanups:
https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00689.html

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05227.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05896.html

Changes since last posting:
- lots of tweaks to resolve Max's review findings, including
rewriting my additions to test 154
- a new patch splitting QCOW2_CLUSTER_ZERO that fell out from
my review of Max's work
- defer any optimizations of a backing file with different length
until later (I will still post an RFC patch to explore what
optimizations a BDRV_BLOCK_EOF would allow, but didn't want to
hold up this series any further)

001/10:[down] 'qcow2: Use consistent switch indentation'
002/10:[0043] [FC] 'block: Update comments on BDRV_BLOCK_* meanings'
003/10:[0023] [FC] 'qcow2: Correctly report status of preallocated zero clusters'
004/10:[down] 'qcow2: Make distinction between zero cluster types obvious'
005/10:[0008] [FC] 'qcow2: Optimize zero_single_l2() to minimize L2 churn'
006/10:[down] 'iotests: Improve _filter_qemu_img_map'
007/10:[0107] [FC] 'iotests: Add test 179 to cover write zeroes with unmap'
008/10:[0260] [FC] 'qcow2: Optimize write zero of unaligned tail cluster'
009/10:[----] [--] 'qcow2: Assert that cluster operations are aligned'
010/10:[0005] [FC] 'qcow2: Discard/zero clusters by byte count'

Eric Blake (10):
  qcow2: Use consistent switch indentation
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Correctly report status of preallocated zero clusters
  qcow2: Make distinction between zero cluster types obvious
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  iotests: Improve _filter_qemu_img_map
  iotests: Add test 179 to cover write zeroes with unmap
  qcow2: Optimize write zero of unaligned tail cluster
  qcow2: Assert that cluster operations are aligned
  qcow2: Discard/zero clusters by byte count

 block/qcow2.h                    |  17 +++--
 include/block/block.h            |  35 +++++----
 include/block/block_int.h        |   7 ++
 block/qcow2-cluster.c            | 156 ++++++++++++++++++++++----------------
 block/qcow2-refcount.c           | 124 +++++++++++++++---------------
 block/qcow2-snapshot.c           |   7 +-
 block/qcow2.c                    |  38 ++++++----
 tests/qemu-iotests/common.filter |   4 +-
 tests/qemu-iotests/122.out       |  16 ++--
 tests/qemu-iotests/154           | 160 ++++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/154.out       | 159 ++++++++++++++++++++++++++++++++++----
 tests/qemu-iotests/179           | 132 ++++++++++++++++++++++++++++++++
 tests/qemu-iotests/179.out       | 160 +++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group         |   1 +
 14 files changed, 818 insertions(+), 198 deletions(-)
 create mode 100755 tests/qemu-iotests/179
 create mode 100644 tests/qemu-iotests/179.out

-- 
2.9.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 01/10] qcow2: Use consistent switch indentation
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 19:42   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Fix a couple of inconsistent indentations, before an upcoming
patch further tweaks the switch statements.  While at it, make
some tweaks for shorter lines to keep checkpatch happy (best
viewed with 'git diff -b').

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: new patch
---
 block/qcow2-cluster.c  | 32 ++++++++---------
 block/qcow2-refcount.c | 96 ++++++++++++++++++++++++++------------------------
 2 files changed, 65 insertions(+), 63 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 31077d8..335a505 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1504,25 +1504,25 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
          * but rather fall through to the backing file.
          */
         switch (qcow2_get_cluster_type(old_l2_entry)) {
-            case QCOW2_CLUSTER_UNALLOCATED:
-                if (full_discard || !bs->backing) {
-                    continue;
-                }
-                break;
+        case QCOW2_CLUSTER_UNALLOCATED:
+            if (full_discard || !bs->backing) {
+                continue;
+            }
+            break;

-            case QCOW2_CLUSTER_ZERO:
-                /* Preallocated zero clusters should be discarded in any case */
-                if (!full_discard && (old_l2_entry & L2E_OFFSET_MASK) == 0) {
-                    continue;
-                }
-                break;
+        case QCOW2_CLUSTER_ZERO:
+            /* Preallocated zero clusters should be discarded in any case */
+            if (!full_discard && (old_l2_entry & L2E_OFFSET_MASK) == 0) {
+                continue;
+            }
+            break;

-            case QCOW2_CLUSTER_NORMAL:
-            case QCOW2_CLUSTER_COMPRESSED:
-                break;
+        case QCOW2_CLUSTER_NORMAL:
+        case QCOW2_CLUSTER_COMPRESSED:
+            break;

-            default:
-                abort();
+        default:
+            abort();
         }

         /* First remove L2 entries */
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 4efca7e..a5a0076 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1117,70 +1117,72 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                 goto fail;
             }

-            for(j = 0; j < s->l2_size; j++) {
+            for (j = 0; j < s->l2_size; j++) {
                 uint64_t cluster_index;
+                uint64_t offset_masked;

                 offset = be64_to_cpu(l2_table[j]);
                 old_offset = offset;
+                offset_masked = offset & L2E_OFFSET_MASK;
                 offset &= ~QCOW_OFLAG_COPIED;

                 switch (qcow2_get_cluster_type(offset)) {
-                    case QCOW2_CLUSTER_COMPRESSED:
-                        nb_csectors = ((offset >> s->csize_shift) &
-                                       s->csize_mask) + 1;
-                        if (addend != 0) {
-                            ret = update_refcount(bs,
-                                (offset & s->cluster_offset_mask) & ~511,
-                                nb_csectors * 512, abs(addend), addend < 0,
-                                QCOW2_DISCARD_SNAPSHOT);
-                            if (ret < 0) {
-                                goto fail;
-                            }
-                        }
-                        /* compressed clusters are never modified */
-                        refcount = 2;
-                        break;
-
-                    case QCOW2_CLUSTER_NORMAL:
-                    case QCOW2_CLUSTER_ZERO:
-                        if (offset_into_cluster(s, offset & L2E_OFFSET_MASK)) {
-                            qcow2_signal_corruption(bs, true, -1, -1, "Data "
-                                                    "cluster offset %#llx "
-                                                    "unaligned (L2 offset: %#"
-                                                    PRIx64 ", L2 index: %#x)",
-                                                    offset & L2E_OFFSET_MASK,
-                                                    l2_offset, j);
-                            ret = -EIO;
+                case QCOW2_CLUSTER_COMPRESSED:
+                    nb_csectors = ((offset >> s->csize_shift) &
+                                   s->csize_mask) + 1;
+                    if (addend != 0) {
+                        ret = update_refcount(bs,
+                            (offset & s->cluster_offset_mask) & ~511,
+                            nb_csectors * 512, abs(addend), addend < 0,
+                            QCOW2_DISCARD_SNAPSHOT);
+                        if (ret < 0) {
                             goto fail;
                         }
+                    }
+                    /* compressed clusters are never modified */
+                    refcount = 2;
+                    break;

-                        cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
-                        if (!cluster_index) {
-                            /* unallocated */
-                            refcount = 0;
-                            break;
-                        }
-                        if (addend != 0) {
-                            ret = qcow2_update_cluster_refcount(bs,
+                case QCOW2_CLUSTER_NORMAL:
+                case QCOW2_CLUSTER_ZERO:
+                    if (offset_into_cluster(s, offset_masked)) {
+                        qcow2_signal_corruption(bs, true, -1, -1, "Data "
+                                                "cluster offset %#" PRIx64
+                                                " unaligned (L2 offset: %#"
+                                                PRIx64 ", L2 index: %#x)",
+                                                offset_masked,
+                                                l2_offset, j);
+                        ret = -EIO;
+                        goto fail;
+                    }
+
+                    cluster_index = offset_masked >> s->cluster_bits;
+                    if (!cluster_index) {
+                        /* unallocated */
+                        refcount = 0;
+                        break;
+                    }
+                    if (addend != 0) {
+                        ret = qcow2_update_cluster_refcount(bs,
                                     cluster_index, abs(addend), addend < 0,
                                     QCOW2_DISCARD_SNAPSHOT);
-                            if (ret < 0) {
-                                goto fail;
-                            }
-                        }
-
-                        ret = qcow2_get_refcount(bs, cluster_index, &refcount);
                         if (ret < 0) {
                             goto fail;
                         }
-                        break;
+                    }

-                    case QCOW2_CLUSTER_UNALLOCATED:
-                        refcount = 0;
-                        break;
+                    ret = qcow2_get_refcount(bs, cluster_index, &refcount);
+                    if (ret < 0) {
+                        goto fail;
+                    }
+                    break;

-                    default:
-                        abort();
+                case QCOW2_CLUSTER_UNALLOCATED:
+                    refcount = 0;
+                    break;
+
+                default:
+                    abort();
                 }

                 if (refcount == 1) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 01/10] qcow2: Use consistent switch indentation Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 20:06   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 03/10] qcow2: Correctly report status of preallocated zero clusters Eric Blake
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

We had some conflicting documentation: a nice 8-way table that
described all possible combinations of DATA, ZERO, and
OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
always meant raw data could be read directly.  Furthermore, the
text refers a lot to bs->file, even though the interface was
updated back in 67a0fd2a to let the driver pass back which BDS (not
necessarily bs->file).  As the 8-way table is the intended
semantics, simplify the rest of the text to get rid of the
confusion.

ALLOCATED is always set by the block layer for convenience (drivers
do not have to worry about it). RAW is used only internally, but
by more than the raw driver.  Document these additional items on
the driver callback.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: even more wording tweaks
v11: reserved for blkdebug half of v10
v10: new patch
---
 include/block/block.h     | 35 +++++++++++++++++++----------------
 include/block/block_int.h |  7 +++++++
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 862eb56..c8bec7d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -120,29 +120,32 @@ typedef struct HDGeometry {
 #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)

 /*
- * Allocation status flags
- * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_get_block_status.
- * BDRV_BLOCK_ZERO: sectors read as zero
- * BDRV_BLOCK_OFFSET_VALID: sector stored as raw data in a file returned by
- *                          bdrv_get_block_status.
+ * Allocation status flags for bdrv_get_block_status() and friends.
+ *
+ * Public flags:
+ * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
+ * BDRV_BLOCK_ZERO: offset reads as zero
+ * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
  * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
- *                       layer (as opposed to the backing file)
- * BDRV_BLOCK_RAW: used internally to indicate that the request
- *                 was answered by the raw driver and that one
- *                 should look in bs->file directly.
+ *                       layer (short for DATA || ZERO), set by block layer
  *
- * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
- * bs->file where sector data can be read from as raw data.
+ * Internal flag:
+ * BDRV_BLOCK_RAW: used internally to indicate that the request was
+ *                 answered by a passthrough driver such as raw and that the
+ *                 block layer should recompute the answer from bs->file.
  *
- * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
+ * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK)
+ * represent the offset in the returned BDS that is allocated for the
+ * corresponding raw data; however, whether that offset actually contains
+ * data also depends on BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO, as follows:
  *
  * DATA ZERO OFFSET_VALID
- *  t    t        t       sectors read as zero, bs->file is zero at offset
- *  t    f        t       sectors read as valid from bs->file at offset
- *  f    t        t       sectors preallocated, read as zero, bs->file not
+ *  t    t        t       sectors read as zero, returned file is zero at offset
+ *  t    f        t       sectors read as valid from file at offset
+ *  f    t        t       sectors preallocated, read as zero, returned file not
  *                        necessarily zero at offset
  *  f    f        t       sectors preallocated but read from backing_hd,
- *                        bs->file contains garbage at offset
+ *                        returned file contains garbage at offset
  *  t    t        f       sectors preallocated, read as zero, unknown offset
  *  t    f        f       sectors read from unknown file or offset
  *  f    t        f       not allocated or unknown offset, read as zero
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8773940..1fdfff7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -165,6 +165,13 @@ struct BlockDriver {
         int64_t offset, int count, BdrvRequestFlags flags);
     int coroutine_fn (*bdrv_co_pdiscard)(BlockDriverState *bs,
         int64_t offset, int count);
+
+    /*
+     * Building block for bdrv_block_status[_above]. The driver should
+     * answer only according to the current layer, and should not
+     * set BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
+     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.
+     */
     int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum,
         BlockDriverState **file);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 03/10] qcow2: Correctly report status of preallocated zero clusters
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 01/10] qcow2: Use consistent switch indentation Eric Blake
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 20:24   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 04/10] qcow2: Make distinction between zero cluster types obvious Eric Blake
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

We were throwing away the preallocation information associated with
zero clusters.  But we should be matching the well-defined semantics
in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
while still reminding the user that reading from that offset is
likely to read garbage.

count_contiguous_clusters_by_type() is now used only for unallocated
cluster runs, hence it gets renamed and tightened.

Making this change lets us see which portions of an image are zero
but preallocated, when using qemu-img map --output=json.  The
--output=human side intentionally ignores all zero clusters, whether
or not they are preallocated.

The fact that there is no change to qemu-iotests './check -qcow2'
merely means that we aren't yet testing this aspect of qemu-img;
a later patch will add a test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: rename helper function
v11: reserved for blkdebug half of v10
v10: new patch
---
 block/qcow2-cluster.c | 45 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 335a505..f3bfce6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -334,16 +334,23 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
 	return i;
 }

-static int count_contiguous_clusters_by_type(int nb_clusters,
-                                             uint64_t *l2_table,
-                                             int wanted_type)
+/*
+ * Checks how many consecutive unallocated clusters in a given L2
+ * table have the same cluster type.
+ */
+static int count_contiguous_clusters_unallocated(int nb_clusters,
+                                                 uint64_t *l2_table,
+                                                 int wanted_type)
 {
     int i;

+    assert(wanted_type == QCOW2_CLUSTER_ZERO ||
+           wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
-        int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
+        uint64_t entry = be64_to_cpu(l2_table[i]);
+        int type = qcow2_get_cluster_type(entry);

-        if (type != wanted_type) {
+        if (type != wanted_type || entry & L2E_OFFSET_MASK) {
             break;
         }
     }
@@ -565,14 +572,32 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
             ret = -EIO;
             goto fail;
         }
-        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
-                                              QCOW2_CLUSTER_ZERO);
-        *cluster_offset = 0;
+        /* Distinguish between pure zero clusters and pre-allocated ones */
+        if (*cluster_offset & L2E_OFFSET_MASK) {
+            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
+                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);
+            *cluster_offset &= L2E_OFFSET_MASK;
+            if (offset_into_cluster(s, *cluster_offset)) {
+                qcow2_signal_corruption(bs, true, -1, -1,
+                                        "Preallocated zero cluster offset %#"
+                                        PRIx64 " unaligned (L2 offset: %#"
+                                        PRIx64 ", L2 index: %#x)",
+                                        *cluster_offset, l2_offset, l2_index);
+                ret = -EIO;
+                goto fail;
+            }
+        } else {
+            c = count_contiguous_clusters_unallocated(nb_clusters,
+                                                      &l2_table[l2_index],
+                                                      QCOW2_CLUSTER_ZERO);
+            *cluster_offset = 0;
+        }
         break;
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
-        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
-                                              QCOW2_CLUSTER_UNALLOCATED);
+        c = count_contiguous_clusters_unallocated(nb_clusters,
+                                                  &l2_table[l2_index],
+                                                  QCOW2_CLUSTER_UNALLOCATED);
         *cluster_offset = 0;
         break;
     case QCOW2_CLUSTER_NORMAL:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 04/10] qcow2: Make distinction between zero cluster types obvious
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (2 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 03/10] qcow2: Correctly report status of preallocated zero clusters Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 20:51   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 05/10] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Treat plain zero clusters differently from allocated ones, so that
we can simplify the logic of checking whether an offset is present.
Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.

I tried to arrange the enum so that we could use
'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
I didn't actually end up taking advantage of the layout.

In many cases, this leads to simpler code, by properly combining
cases (sometimes, both zero types pair together, other times,
plain zero is more like unallocated while allocated zero is more
like normal).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: new patch
---
 block/qcow2.h          |  8 +++++--
 block/qcow2-cluster.c  | 65 ++++++++++++++++++--------------------------------
 block/qcow2-refcount.c | 40 +++++++++++++------------------
 block/qcow2.c          |  9 ++++---
 4 files changed, 51 insertions(+), 71 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 8731f24..142f81b 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -351,9 +351,10 @@ typedef struct QCowL2Meta

 enum {
     QCOW2_CLUSTER_UNALLOCATED,
+    QCOW2_CLUSTER_ZERO_PLAIN,
+    QCOW2_CLUSTER_ZERO_ALLOC,
     QCOW2_CLUSTER_NORMAL,
     QCOW2_CLUSTER_COMPRESSED,
-    QCOW2_CLUSTER_ZERO
 };

 typedef enum QCow2MetadataOverlap {
@@ -448,7 +449,10 @@ static inline int qcow2_get_cluster_type(uint64_t l2_entry)
     if (l2_entry & QCOW_OFLAG_COMPRESSED) {
         return QCOW2_CLUSTER_COMPRESSED;
     } else if (l2_entry & QCOW_OFLAG_ZERO) {
-        return QCOW2_CLUSTER_ZERO;
+        if (l2_entry & L2E_OFFSET_MASK) {
+            return QCOW2_CLUSTER_ZERO_ALLOC;
+        }
+        return QCOW2_CLUSTER_ZERO_PLAIN;
     } else if (!(l2_entry & L2E_OFFSET_MASK)) {
         return QCOW2_CLUSTER_UNALLOCATED;
     } else {
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f3bfce6..14e2086 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -321,8 +321,7 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
     /* must be allocated */
     first_cluster_type = qcow2_get_cluster_type(first_entry);
     assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
-           (first_cluster_type == QCOW2_CLUSTER_ZERO &&
-            (first_entry & L2E_OFFSET_MASK) != 0));
+           first_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);

     for (i = 0; i < nb_clusters; i++) {
         uint64_t l2_entry = be64_to_cpu(l2_table[i]) & mask;
@@ -344,7 +343,7 @@ static int count_contiguous_clusters_unallocated(int nb_clusters,
 {
     int i;

-    assert(wanted_type == QCOW2_CLUSTER_ZERO ||
+    assert(wanted_type == QCOW2_CLUSTER_ZERO_PLAIN ||
            wanted_type == QCOW2_CLUSTER_UNALLOCATED);
     for (i = 0; i < nb_clusters; i++) {
         uint64_t entry = be64_to_cpu(l2_table[i]);
@@ -558,52 +557,32 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
     assert(nb_clusters <= INT_MAX);

     ret = qcow2_get_cluster_type(*cluster_offset);
+    if (s->qcow_version < 3 && (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
+                                ret == QCOW2_CLUSTER_ZERO_ALLOC)) {
+        qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
+                                " in pre-v3 image (L2 offset: %#" PRIx64
+                                ", L2 index: %#x)", l2_offset, l2_index);
+        ret = -EIO;
+        goto fail;
+    }
     switch (ret) {
     case QCOW2_CLUSTER_COMPRESSED:
         /* Compressed clusters can only be processed one by one */
         c = 1;
         *cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
         break;
-    case QCOW2_CLUSTER_ZERO:
-        if (s->qcow_version < 3) {
-            qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
-                                    " in pre-v3 image (L2 offset: %#" PRIx64
-                                    ", L2 index: %#x)", l2_offset, l2_index);
-            ret = -EIO;
-            goto fail;
-        }
-        /* Distinguish between pure zero clusters and pre-allocated ones */
-        if (*cluster_offset & L2E_OFFSET_MASK) {
-            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
-                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);
-            *cluster_offset &= L2E_OFFSET_MASK;
-            if (offset_into_cluster(s, *cluster_offset)) {
-                qcow2_signal_corruption(bs, true, -1, -1,
-                                        "Preallocated zero cluster offset %#"
-                                        PRIx64 " unaligned (L2 offset: %#"
-                                        PRIx64 ", L2 index: %#x)",
-                                        *cluster_offset, l2_offset, l2_index);
-                ret = -EIO;
-                goto fail;
-            }
-        } else {
-            c = count_contiguous_clusters_unallocated(nb_clusters,
-                                                      &l2_table[l2_index],
-                                                      QCOW2_CLUSTER_ZERO);
-            *cluster_offset = 0;
-        }
-        break;
+    case QCOW2_CLUSTER_ZERO_PLAIN:
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
         c = count_contiguous_clusters_unallocated(nb_clusters,
-                                                  &l2_table[l2_index],
-                                                  QCOW2_CLUSTER_UNALLOCATED);
+                                                  &l2_table[l2_index], ret);
         *cluster_offset = 0;
         break;
+    case QCOW2_CLUSTER_ZERO_ALLOC:
     case QCOW2_CLUSTER_NORMAL:
         /* how many allocated clusters ? */
         c = count_contiguous_clusters(nb_clusters, s->cluster_size,
-                &l2_table[l2_index], QCOW_OFLAG_ZERO);
+                                      &l2_table[l2_index], QCOW_OFLAG_ZERO);
         *cluster_offset &= L2E_OFFSET_MASK;
         if (offset_into_cluster(s, *cluster_offset)) {
             qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset %#"
@@ -901,7 +880,8 @@ static int count_cow_clusters(BDRVQcow2State *s, int nb_clusters,
             break;
         case QCOW2_CLUSTER_UNALLOCATED:
         case QCOW2_CLUSTER_COMPRESSED:
-        case QCOW2_CLUSTER_ZERO:
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
             break;
         default:
             abort();
@@ -1202,8 +1182,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
      * wrong with our code. */
     assert(nb_clusters > 0);

-    if (qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO &&
-        (entry & L2E_OFFSET_MASK) != 0 && (entry & QCOW_OFLAG_COPIED) &&
+    if (qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO_ALLOC &&
+        (entry & QCOW_OFLAG_COPIED) &&
         (!*host_offset ||
          start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK)))
     {
@@ -1535,13 +1515,13 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
             }
             break;

-        case QCOW2_CLUSTER_ZERO:
-            /* Preallocated zero clusters should be discarded in any case */
-            if (!full_discard && (old_l2_entry & L2E_OFFSET_MASK) == 0) {
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+            if (!full_discard) {
                 continue;
             }
             break;

+        case QCOW2_CLUSTER_ZERO_ALLOC:
         case QCOW2_CLUSTER_NORMAL:
         case QCOW2_CLUSTER_COMPRESSED:
             break;
@@ -1760,7 +1740,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
             int cluster_type = qcow2_get_cluster_type(l2_entry);
             bool preallocated = offset != 0;

-            if (cluster_type != QCOW2_CLUSTER_ZERO) {
+            if (cluster_type != QCOW2_CLUSTER_ZERO_PLAIN &&
+                cluster_type != QCOW2_CLUSTER_ZERO_ALLOC) {
                 continue;
             }

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index a5a0076..fa2ea05 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1028,18 +1028,17 @@ void qcow2_free_any_clusters(BlockDriverState *bs, uint64_t l2_entry,
         }
         break;
     case QCOW2_CLUSTER_NORMAL:
-    case QCOW2_CLUSTER_ZERO:
-        if (l2_entry & L2E_OFFSET_MASK) {
-            if (offset_into_cluster(s, l2_entry & L2E_OFFSET_MASK)) {
-                qcow2_signal_corruption(bs, false, -1, -1,
-                                        "Cannot free unaligned cluster %#llx",
-                                        l2_entry & L2E_OFFSET_MASK);
-            } else {
-                qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
-                                    nb_clusters << s->cluster_bits, type);
-            }
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        if (offset_into_cluster(s, l2_entry & L2E_OFFSET_MASK)) {
+            qcow2_signal_corruption(bs, false, -1, -1,
+                                    "Cannot free unaligned cluster %#llx",
+                                    l2_entry & L2E_OFFSET_MASK);
+        } else {
+            qcow2_free_clusters(bs, l2_entry & L2E_OFFSET_MASK,
+                                nb_clusters << s->cluster_bits, type);
         }
         break;
+    case QCOW2_CLUSTER_ZERO_PLAIN:
     case QCOW2_CLUSTER_UNALLOCATED:
         break;
     default:
@@ -1144,7 +1143,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     break;

                 case QCOW2_CLUSTER_NORMAL:
-                case QCOW2_CLUSTER_ZERO:
+                case QCOW2_CLUSTER_ZERO_ALLOC:
                     if (offset_into_cluster(s, offset_masked)) {
                         qcow2_signal_corruption(bs, true, -1, -1, "Data "
                                                 "cluster offset %#" PRIx64
@@ -1157,11 +1156,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     }

                     cluster_index = offset_masked >> s->cluster_bits;
-                    if (!cluster_index) {
-                        /* unallocated */
-                        refcount = 0;
-                        break;
-                    }
+                    assert(cluster_index);
                     if (addend != 0) {
                         ret = qcow2_update_cluster_refcount(bs,
                                     cluster_index, abs(addend), addend < 0,
@@ -1177,6 +1172,7 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     }
                     break;

+                case QCOW2_CLUSTER_ZERO_PLAIN:
                 case QCOW2_CLUSTER_UNALLOCATED:
                     refcount = 0;
                     break;
@@ -1443,12 +1439,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
             }
             break;

-        case QCOW2_CLUSTER_ZERO:
-            if ((l2_entry & L2E_OFFSET_MASK) == 0) {
-                break;
-            }
-            /* fall through */
-
+        case QCOW2_CLUSTER_ZERO_ALLOC:
         case QCOW2_CLUSTER_NORMAL:
         {
             uint64_t offset = l2_entry & L2E_OFFSET_MASK;
@@ -1478,6 +1469,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
             break;
         }

+        case QCOW2_CLUSTER_ZERO_PLAIN:
         case QCOW2_CLUSTER_UNALLOCATED:
             break;

@@ -1642,8 +1634,8 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
             uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
             int cluster_type = qcow2_get_cluster_type(l2_entry);

-            if ((cluster_type == QCOW2_CLUSTER_NORMAL) ||
-                ((cluster_type == QCOW2_CLUSTER_ZERO) && (data_offset != 0))) {
+            if (cluster_type == QCOW2_CLUSTER_NORMAL ||
+                cluster_type == QCOW2_CLUSTER_ZERO_ALLOC) {
                 ret = qcow2_get_refcount(bs,
                                          data_offset >> s->cluster_bits,
                                          &refcount);
diff --git a/block/qcow2.c b/block/qcow2.c
index f5a72a4..dded5a0 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1385,7 +1385,7 @@ static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
         *file = bs->file->bs;
         status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
     }
-    if (ret == QCOW2_CLUSTER_ZERO) {
+    if (ret == QCOW2_CLUSTER_ZERO_PLAIN || ret == QCOW2_CLUSTER_ZERO_ALLOC) {
         status |= BDRV_BLOCK_ZERO;
     } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
         status |= BDRV_BLOCK_DATA;
@@ -1482,7 +1482,8 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
             }
             break;

-        case QCOW2_CLUSTER_ZERO:
+        case QCOW2_CLUSTER_ZERO_PLAIN:
+        case QCOW2_CLUSTER_ZERO_ALLOC:
             qemu_iovec_memset(&hd_qiov, 0, 0, cur_bytes);
             break;

@@ -2491,7 +2492,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
         count = s->cluster_size;
         nr = s->cluster_size;
         ret = qcow2_get_cluster_offset(bs, offset, &nr, &off);
-        if (ret != QCOW2_CLUSTER_UNALLOCATED && ret != QCOW2_CLUSTER_ZERO) {
+        if (ret != QCOW2_CLUSTER_UNALLOCATED &&
+            ret != QCOW2_CLUSTER_ZERO_PLAIN &&
+            ret != QCOW2_CLUSTER_ZERO_ALLOC) {
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 05/10] qcow2: Optimize zero_single_l2() to minimize L2 churn
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (3 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 04/10] qcow2: Make distinction between zero cluster types obvious Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 20:55   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map Eric Blake
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Similar to discard_single_l2(), we should try to avoid dirtying
the L2 cache when the cluster we are changing already has the
right characteristics.

Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
is a requirement to unallocate a cluster (this is because the block
layer clears that flag if discard.* flags during open requested that
we never punch holes - see the conversation around commit 170f4b2e,
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
Therefore, this patch can only reuse a zero cluster as-is if either
unmapping is not requested, or if the zero cluster was not associated
with an allocation.

Technically, there are some cases where an unallocated cluster
already reads as all zeroes (namely, when there is no backing file
[easy: check bs->backing], or when the backing file also reads as
zeroes [harder: we can't check bdrv_get_block_status since we are
already holding the lock]), where the guest would not immediately see
a difference if we left that cluster unallocated.  But if the user
did not request unmapping, leaving an unallocated cluster is wrong;
and even if the user DID request unmapping, keeping a cluster
unallocated risks a subtle semantic change of guest-visible contents
if a backing file is later added, and it is not worth auditing
whether all internal uses such as mirror properly avoid an unmap
request.  Thus, this patch is intentionally limited to just clusters
that are already marked as zero.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: store cluster type in temporary
v11: reserved for blkdebug half of v10
v10: new patch, replacing earlier attempt to use unallocated clusters,
and ditching any optimization of v2 files
---
 block/qcow2-cluster.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 14e2086..78fbe34 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1599,6 +1599,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
     int l2_index;
     int ret;
     int i;
+    bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);

     ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
     if (ret < 0) {
@@ -1611,12 +1612,22 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,

     for (i = 0; i < nb_clusters; i++) {
         uint64_t old_offset;
+        int cluster_type;

         old_offset = be64_to_cpu(l2_table[l2_index + i]);

-        /* Update L2 entries */
+        /*
+         * Minimize L2 changes if the cluster already reads back as
+         * zeroes with correct allocation.
+         */
+        cluster_type = qcow2_get_cluster_type(old_offset);
+        if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
+            (cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
+            continue;
+        }
+
         qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
-        if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
+        if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
             l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (4 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 05/10] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 20:58   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Although _filter_qemu_img_map documents that it scrubs offsets, it
was only doing so for human mode.  Of the existing tests using the
filter (97, 122, 150, 154, 176), two of them are affected, but it
does not hurt the validity of the tests to not require particular
mappings (another test, 66, uses offsets but intentionally does not
pass through _filter_qemu_img_map, because it checks that offsets
are unchanged before and after an operation).

Another justification for this patch is that it will allow a future
patch to utilize 'qemu-img map --output=json' to check the status of
preallocated zero clusters without regards to the mapping (since
the qcow2 mapping can be very sensitive to the chosen cluster size,
when preallocation is not in use).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: new patch
---
 tests/qemu-iotests/common.filter |  4 +++-
 tests/qemu-iotests/122.out       | 16 ++++++++--------
 tests/qemu-iotests/154.out       | 30 +++++++++++++++---------------
 3 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index f58548d..2c14f15 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -152,10 +152,12 @@ _filter_img_info()
         -e "/log_size: [0-9]\\+/d"
 }

-# filter out offsets and file names from qemu-img map
+# filter out offsets and file names from qemu-img map; good for both
+# human and json output
 _filter_qemu_img_map()
 {
     sed -e 's/\([0-9a-fx]* *[0-9a-fx]* *\)[0-9a-fx]* */\1/g' \
+	-e 's/"offset": [0-9]\+/"offset": OFFSET/g' \
         -e 's/Mapped to *//' | _filter_testdir | _filter_imgfmt
 }

diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index 9317d80..47d8656 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -112,7 +112,7 @@ read 3145728/3145728 bytes at offset 0
 3 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 63963136/63963136 bytes at offset 3145728
 61 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": true, "offset": 327680}]
+[{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]

 convert -c -S 0:
 read 3145728/3145728 bytes at offset 0
@@ -134,7 +134,7 @@ read 30408704/30408704 bytes at offset 3145728
 29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 33554432/33554432 bytes at offset 33554432
 32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": true, "offset": 327680}]
+[{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]

 convert -c -S 0 with source backing file:
 read 3145728/3145728 bytes at offset 0
@@ -152,7 +152,7 @@ read 30408704/30408704 bytes at offset 3145728
 29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 33554432/33554432 bytes at offset 33554432
 32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": true, "offset": 327680}]
+[{ "start": 0, "length": 67108864, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]

 convert -c -S 0 -B ...
 read 3145728/3145728 bytes at offset 0
@@ -176,11 +176,11 @@ wrote 1024/1024 bytes at offset 17408
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

 convert -S 4k
-[{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": 8192},
+[{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 1024, "length": 7168, "depth": 0, "zero": true, "data": false},
-{ "start": 8192, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": 9216},
+{ "start": 8192, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 9216, "length": 8192, "depth": 0, "zero": true, "data": false},
-{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": 10240},
+{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": false}]

 convert -c -S 4k
@@ -192,9 +192,9 @@ convert -c -S 4k
 { "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": false}]

 convert -S 8k
-[{ "start": 0, "length": 9216, "depth": 0, "zero": false, "data": true, "offset": 8192},
+[{ "start": 0, "length": 9216, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 9216, "length": 8192, "depth": 0, "zero": true, "data": false},
-{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": 17408},
+{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": false}]

 convert -c -S 8k
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index da9eabd..d3b68e7 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -42,9 +42,9 @@ read 1024/1024 bytes at offset 65536
 read 2048/2048 bytes at offset 67584
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
-{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 36864, "length": 28672, "depth": 1, "zero": true, "data": false},
-{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 69632, "length": 134148096, "depth": 1, "zero": true, "data": false}]

 == backing file contains non-zero data after write_zeroes ==
@@ -69,9 +69,9 @@ read 1024/1024 bytes at offset 44032
 read 3072/3072 bytes at offset 40960
 3 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
-{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 36864, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 40960, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 40960, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 45056, "length": 134172672, "depth": 1, "zero": true, "data": false}]

 == write_zeroes covers non-zero data ==
@@ -143,13 +143,13 @@ read 1024/1024 bytes at offset 67584
 read 5120/5120 bytes at offset 68608
 5 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
-{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 36864, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 40960, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 49152, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 49152, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 53248, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 57344, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
+{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 73728, "length": 134144000, "depth": 1, "zero": true, "data": false}]

@@ -186,13 +186,13 @@ read 1024/1024 bytes at offset 72704
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
 { "start": 32768, "length": 4096, "depth": 0, "zero": true, "data": false},
-{ "start": 36864, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 36864, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 40960, "length": 8192, "depth": 1, "zero": true, "data": false},
 { "start": 49152, "length": 4096, "depth": 0, "zero": true, "data": false},
-{ "start": 53248, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 53248, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 57344, "length": 8192, "depth": 1, "zero": true, "data": false},
 { "start": 65536, "length": 4096, "depth": 0, "zero": true, "data": false},
-{ "start": 69632, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
+{ "start": 69632, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 73728, "length": 134144000, "depth": 1, "zero": true, "data": false}]

 == spanning two clusters, partially overwriting backing file ==
@@ -212,7 +212,7 @@ read 1024/1024 bytes at offset 5120
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 2048/2048 bytes at offset 6144
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[{ "start": 0, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 20480},
+[{ "start": 0, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 8192, "length": 134209536, "depth": 1, "zero": true, "data": false}]

 == spanning multiple clusters, non-zero in first cluster ==
@@ -227,7 +227,7 @@ read 2048/2048 bytes at offset 65536
 read 10240/10240 bytes at offset 67584
 10 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 65536, "depth": 1, "zero": true, "data": false},
-{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 69632, "length": 8192, "depth": 0, "zero": true, "data": false},
 { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}]

@@ -257,7 +257,7 @@ read 2048/2048 bytes at offset 75776
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 65536, "depth": 1, "zero": true, "data": false},
 { "start": 65536, "length": 8192, "depth": 0, "zero": true, "data": false},
-{ "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}]

 == spanning multiple clusters, partially overwriting backing file ==
@@ -278,8 +278,8 @@ read 2048/2048 bytes at offset 74752
 read 1024/1024 bytes at offset 76800
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 65536, "depth": 1, "zero": true, "data": false},
-{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
-{ "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}]
 *** done
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (5 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 21:24   ` Max Reitz
  2017-05-05 22:29   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 08/10] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

No tests were covering write zeroes with unmap.  Additionally,
I needed to prove that my previous patches for correct status
reporting and write zeroes optimizations actually had an impact.

The test works for cluster_size between 8k and 2M (for smaller
sizes, it fails because our allocation patterns are not contiguous
with small clusters - in part, the largest consecutive allocation
we tend to get is often bounded by the size covered by one L2
table).

Note that testing for zero clusters is tricky: 'qemu-io map'
reports whether data comes from the current layer of the image
(useful for sniffing out which regions of the file have
QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
while 'qemu-img map' sees "zero":true for both unallocated and
zero clusters for any qcow2 with no backing layer (so less useful
at detecting true zero clusters), but reliably shows mappings.
So we have to rely on both queries side-by-side at each point of
the test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: probe the map in more places, to make test easier to follow
v11: reserved for blkdebug half of v10
v10: drop any changes to v2 files, rewrite test to work with updates
earlier in the series, add a blkdebug probe
v9: new patch
---
 tests/qemu-iotests/179     | 132 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/179.out | 160 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 293 insertions(+)
 create mode 100755 tests/qemu-iotests/179
 create mode 100644 tests/qemu-iotests/179.out

diff --git a/tests/qemu-iotests/179 b/tests/qemu-iotests/179
new file mode 100755
index 0000000..124666f
--- /dev/null
+++ b/tests/qemu-iotests/179
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Test case for write zeroes with unmap
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=eblake@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# v2 images can't mark clusters as zero
+_unsupported_imgopts compat=0.10
+
+echo
+echo '=== Testing write zeroes with unmap ==='
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 64M
+_make_test_img -b "$TEST_IMG.base"
+
+# Offsets chosen at or near 2M boundaries so test works at any cluster size
+
+# Aligned writes to unallocated cluster should not allocate mapping, but must
+# mark cluster as zero, whether or not unmap was requested
+$QEMU_IO -c "write -z -u 2M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 6M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "map" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+# Unaligned writes need not allocate mapping if the cluster already reads
+# as zero, but must mark cluster as zero, whether or not unmap was requested
+$QEMU_IO -c "write -z -u 10485761 2097150" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 14680065 2097150" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "map" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+# Requesting unmap of normal data must deallocate; omitting unmap should
+# preserve the mapping
+$QEMU_IO -c "write 18M 14M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 20M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 24M 6M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "map" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+# Likewise when writing on already-mapped zero data
+$QEMU_IO -c "write -z -u 26M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 28M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "map" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+# Writing on unmapped zeroes does not allocate
+$QEMU_IO -c "write -z 32M 8M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 34M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 36M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "map" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+# Writing zero overrides a backing file, regardless of backing cluster type
+$QEMU_IO -c "write -z 40M 8M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write 48M 8M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 42M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z 44M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 50M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z 52M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 58M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z 60M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "map" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Final check that mappings are correct and images are still sane
+TEST_IMG="$TEST_IMG.base" _check_test_img
+_check_test_img
+
+echo
+echo '=== Testing cache optimization ==='
+echo
+
+BLKDBG_TEST_IMG="blkdebug:$TEST_DIR/blkdebug.conf:$TEST_IMG.base"
+
+cat > "$TEST_DIR/blkdebug.conf" <<EOF
+[inject-error]
+event = "l2_update"
+errno = "5"
+immediately = "on"
+once = "off"
+EOF
+
+# None of the following writes should trigger an L2 update, because the
+# cluster already reads as zero, and we don't have to change allocation
+$QEMU_IO -c "w -z -u 20M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "w -z 20M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "w -z 28M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+
+# Proof that our blkdebug hook works
+$QEMU_IO -c "w -z -u 0M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+
+# success, all done
+echo '*** done'
+status=0
diff --git a/tests/qemu-iotests/179.out b/tests/qemu-iotests/179.out
new file mode 100644
index 0000000..d66b24e
--- /dev/null
+++ b/tests/qemu-iotests/179.out
@@ -0,0 +1,160 @@
+QA output created by 179
+
+=== Testing write zeroes with unmap ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 2097152/2097152 bytes at offset 2097152
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 6291456
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2 MiB (0x200000) bytes not allocated at offset 0 bytes (0x0)
+2 MiB (0x200000) bytes     allocated at offset 2 MiB (0x200000)
+2 MiB (0x200000) bytes not allocated at offset 4 MiB (0x400000)
+2 MiB (0x200000) bytes     allocated at offset 6 MiB (0x600000)
+56 MiB (0x3800000) bytes not allocated at offset 8 MiB (0x800000)
+[{ "start": 0, "length": 67108864, "depth": 0, "zero": true, "data": false}]
+wrote 2097150/2097150 bytes at offset 10485761
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097150/2097150 bytes at offset 14680065
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2 MiB (0x200000) bytes not allocated at offset 0 bytes (0x0)
+2 MiB (0x200000) bytes     allocated at offset 2 MiB (0x200000)
+2 MiB (0x200000) bytes not allocated at offset 4 MiB (0x400000)
+2 MiB (0x200000) bytes     allocated at offset 6 MiB (0x600000)
+2 MiB (0x200000) bytes not allocated at offset 8 MiB (0x800000)
+2 MiB (0x200000) bytes     allocated at offset 10 MiB (0xa00000)
+2 MiB (0x200000) bytes not allocated at offset 12 MiB (0xc00000)
+2 MiB (0x200000) bytes     allocated at offset 14 MiB (0xe00000)
+48 MiB (0x3000000) bytes not allocated at offset 16 MiB (0x1000000)
+[{ "start": 0, "length": 67108864, "depth": 0, "zero": true, "data": false}]
+wrote 14680064/14680064 bytes at offset 18874368
+14 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 20971520
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 6291456/6291456 bytes at offset 25165824
+6 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2 MiB (0x200000) bytes not allocated at offset 0 bytes (0x0)
+2 MiB (0x200000) bytes     allocated at offset 2 MiB (0x200000)
+2 MiB (0x200000) bytes not allocated at offset 4 MiB (0x400000)
+2 MiB (0x200000) bytes     allocated at offset 6 MiB (0x600000)
+2 MiB (0x200000) bytes not allocated at offset 8 MiB (0x800000)
+2 MiB (0x200000) bytes     allocated at offset 10 MiB (0xa00000)
+2 MiB (0x200000) bytes not allocated at offset 12 MiB (0xc00000)
+2 MiB (0x200000) bytes     allocated at offset 14 MiB (0xe00000)
+2 MiB (0x200000) bytes not allocated at offset 16 MiB (0x1000000)
+14 MiB (0xe00000) bytes     allocated at offset 18 MiB (0x1200000)
+32 MiB (0x2000000) bytes not allocated at offset 32 MiB (0x2000000)
+[{ "start": 0, "length": 18874368, "depth": 0, "zero": true, "data": false},
+{ "start": 18874368, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 20971520, "length": 2097152, "depth": 0, "zero": true, "data": false},
+{ "start": 23068672, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 25165824, "length": 6291456, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 31457280, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 33554432, "length": 33554432, "depth": 0, "zero": true, "data": false}]
+wrote 2097152/2097152 bytes at offset 27262976
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 29360128
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2 MiB (0x200000) bytes not allocated at offset 0 bytes (0x0)
+2 MiB (0x200000) bytes     allocated at offset 2 MiB (0x200000)
+2 MiB (0x200000) bytes not allocated at offset 4 MiB (0x400000)
+2 MiB (0x200000) bytes     allocated at offset 6 MiB (0x600000)
+2 MiB (0x200000) bytes not allocated at offset 8 MiB (0x800000)
+2 MiB (0x200000) bytes     allocated at offset 10 MiB (0xa00000)
+2 MiB (0x200000) bytes not allocated at offset 12 MiB (0xc00000)
+2 MiB (0x200000) bytes     allocated at offset 14 MiB (0xe00000)
+2 MiB (0x200000) bytes not allocated at offset 16 MiB (0x1000000)
+14 MiB (0xe00000) bytes     allocated at offset 18 MiB (0x1200000)
+32 MiB (0x2000000) bytes not allocated at offset 32 MiB (0x2000000)
+[{ "start": 0, "length": 18874368, "depth": 0, "zero": true, "data": false},
+{ "start": 18874368, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 20971520, "length": 2097152, "depth": 0, "zero": true, "data": false},
+{ "start": 23068672, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 25165824, "length": 2097152, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 27262976, "length": 2097152, "depth": 0, "zero": true, "data": false},
+{ "start": 29360128, "length": 2097152, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 31457280, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 33554432, "length": 33554432, "depth": 0, "zero": true, "data": false}]
+wrote 8388608/8388608 bytes at offset 33554432
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 35651584
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 37748736
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2 MiB (0x200000) bytes not allocated at offset 0 bytes (0x0)
+2 MiB (0x200000) bytes     allocated at offset 2 MiB (0x200000)
+2 MiB (0x200000) bytes not allocated at offset 4 MiB (0x400000)
+2 MiB (0x200000) bytes     allocated at offset 6 MiB (0x600000)
+2 MiB (0x200000) bytes not allocated at offset 8 MiB (0x800000)
+2 MiB (0x200000) bytes     allocated at offset 10 MiB (0xa00000)
+2 MiB (0x200000) bytes not allocated at offset 12 MiB (0xc00000)
+2 MiB (0x200000) bytes     allocated at offset 14 MiB (0xe00000)
+2 MiB (0x200000) bytes not allocated at offset 16 MiB (0x1000000)
+22 MiB (0x1600000) bytes     allocated at offset 18 MiB (0x1200000)
+24 MiB (0x1800000) bytes not allocated at offset 40 MiB (0x2800000)
+[{ "start": 0, "length": 18874368, "depth": 0, "zero": true, "data": false},
+{ "start": 18874368, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 20971520, "length": 2097152, "depth": 0, "zero": true, "data": false},
+{ "start": 23068672, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 25165824, "length": 2097152, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 27262976, "length": 2097152, "depth": 0, "zero": true, "data": false},
+{ "start": 29360128, "length": 2097152, "depth": 0, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 31457280, "length": 2097152, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 33554432, "length": 33554432, "depth": 0, "zero": true, "data": false}]
+wrote 8388608/8388608 bytes at offset 41943040
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 8388608/8388608 bytes at offset 50331648
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 44040192
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 46137344
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 52428800
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 54525952
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 60817408
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 62914560
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+42 MiB (0x2a00000) bytes not allocated at offset 0 bytes (0x0)
+4 MiB (0x400000) bytes     allocated at offset 42 MiB (0x2a00000)
+4 MiB (0x400000) bytes not allocated at offset 46 MiB (0x2e00000)
+4 MiB (0x400000) bytes     allocated at offset 50 MiB (0x3200000)
+4 MiB (0x400000) bytes not allocated at offset 54 MiB (0x3600000)
+4 MiB (0x400000) bytes     allocated at offset 58 MiB (0x3a00000)
+2 MiB (0x200000) bytes not allocated at offset 62 MiB (0x3e00000)
+[{ "start": 0, "length": 18874368, "depth": 1, "zero": true, "data": false},
+{ "start": 18874368, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 20971520, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 23068672, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 25165824, "length": 2097152, "depth": 1, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 27262976, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 29360128, "length": 2097152, "depth": 1, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 31457280, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 33554432, "length": 10485760, "depth": 1, "zero": true, "data": false},
+{ "start": 44040192, "length": 4194304, "depth": 0, "zero": true, "data": false},
+{ "start": 48234496, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 50331648, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 52428800, "length": 4194304, "depth": 0, "zero": true, "data": false},
+{ "start": 56623104, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 58720256, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 60817408, "length": 4194304, "depth": 0, "zero": true, "data": false},
+{ "start": 65011712, "length": 2097152, "depth": 1, "zero": true, "data": false}]
+No errors were found on the image.
+No errors were found on the image.
+
+=== Testing cache optimization ===
+
+wrote 2097152/2097152 bytes at offset 20971520
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 20971520
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 29360128
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Failed to flush the L2 table cache: Input/output error
+Failed to flush the refcount block cache: Input/output error
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 9b9bb4c..395c72a 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -170,4 +170,5 @@
 175 auto quick
 176 rw auto backing
 177 rw auto quick
+179 rw auto quick
 181 rw auto migration
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 08/10] qcow2: Optimize write zero of unaligned tail cluster
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (6 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 22:06   ` Max Reitz
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 09/10] qcow2: Assert that cluster operations are aligned Eric Blake
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

We've already improved discards to operate efficiently on the tail
of an unaligned qcow2 image; it's time to make a similar improvement
to write zeroes.  The special case is only valid at the tail
cluster of a file, where we must recognize that any sectors beyond
the image end would implicitly read as zero, and therefore should
not penalize our logic for widening a partial cluster into writing
the whole cluster as zero.

However, note that for now, the special case of end-of-file is only
recognized if there is no backing file, or if the backing file has
the same length; that's because when the backing file is shorter
than the active layer, we don't have code in place to recognize
that reads of a sector unallocated at the top and beyond the backing
end-of-file are implicitly zero.  It's not much of a real loss,
because most people don't use images that aren't cluster-aligned,
or where the active layer is a different size than the backing
layer (especially where the difference falls within a single cluster).

Update test 154 to cover the new scenarios, using two images of
intentionally differing length.

While at it, fix the test to gracefully skip when run as
./check -qcow2 -o compat=0.10 154
since the older format lacks zero clusters already required earlier
in the test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v12: fix testsuite problems, document shortcoming of differing
v11: reserved for blkdebug half of v10
size of backing file

v10: rebase to better reporting of preallocated zero clusters
v9: new patch
---
 block/qcow2.c              |   7 ++
 tests/qemu-iotests/154     | 160 ++++++++++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/154.out | 129 ++++++++++++++++++++++++++++++++++++
 3 files changed, 294 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dded5a0..3478bb6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2451,6 +2451,10 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
     BlockDriverState *file;
     int64_t res;

+    if (start + count > bs->total_sectors) {
+        count = bs->total_sectors - start;
+    }
+
     if (!count) {
         return true;
     }
@@ -2469,6 +2473,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     uint32_t tail = (offset + count) % s->cluster_size;

     trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count);
+    if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
+        tail = 0;
+    }

     if (head || tail) {
         int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
index 7ca7219..687b8f3 100755
--- a/tests/qemu-iotests/154
+++ b/tests/qemu-iotests/154
@@ -2,7 +2,7 @@
 #
 # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 034)
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Copyright (C) 2016-2017 Red Hat, Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -42,7 +42,10 @@ _supported_proto file
 _supported_os Linux

 CLUSTER_SIZE=4k
-size=128M
+size=$((128 * 1024 * 1024))
+
+# This test requires zero clusters, added in v3 images
+_unsupported_imgopts compat=0.10

 echo
 echo == backing file contains zeros ==
@@ -299,6 +302,159 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | _filter_qemu_io

 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map

+echo
+echo == unaligned image tail cluster, no allocation needed ==
+
+# With no backing file, write to all or part of unallocated partial cluster
+# will mark the cluster as zero, but does not allocate.
+# Re-create the image each time to get back to unallocated clusters.
+
+# Write at the front: sector-wise, the request is: 128m... | 00 -- -- --
+_make_test_img $((size + 2048))
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write at the back: sector-wise, the request is: 128m... | -- -- -- 00
+_make_test_img $((size + 2048))
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write at middle: sector-wise, the request is: 128m... | -- 00 00 --
+_make_test_img $((size + 2048))
+$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write entire cluster: sector-wise, the request is: 128m... | 00 00 00 00
+_make_test_img $((size + 2048))
+$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Repeat with backing file holding unallocated cluster.
+# TODO: Note that this forces an allocation, because we aren't yet able to
+# quickly detect that reads beyond EOF of the backing file are always zero
+CLUSTER_SIZE=2048 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+
+# Write at the front: sector-wise, the request is:
+# backing: 128m... | -- --
+# active:  128m... | 00 -- -- --
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write at the back: sector-wise, the request is:
+# backing: 128m... | -- --
+# active:  128m... | -- -- -- 00
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write at middle: sector-wise, the request is:
+# backing: 128m... | -- --
+# active:  128m... | -- 00 00 --
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write entire cluster: sector-wise, the request is:
+# backing: 128m... | -- --
+# active:  128m... | 00 00 00 00
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Repeat with backing file holding zero'd cluster
+# TODO: Note that this forces an allocation, because we aren't yet able to
+# quickly detect that reads beyond EOF of the backing file are always zero
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
+
+# Write at the front: sector-wise, the request is:
+# backing: 128m... | 00 00
+# active:  128m... | 00 -- -- --
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write at the back: sector-wise, the request is:
+# backing: 128m... | 00 00
+# active:  128m... | -- -- -- 00
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write at middle: sector-wise, the request is:
+# backing: 128m... | 00 00
+# active:  128m... | -- 00 00 --
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Write entire cluster: sector-wise, the request is:
+# backing: 128m... | 00 00
+# active:  128m... | 00 00 00 00
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# A preallocated cluster maintains its allocation, whether it stays as
+# data due to a partial write:
+# Convert 128m... | XX XX => ... | XX 00
+_make_test_img $((size + 1024))
+$QEMU_IO -c "write -P 1 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# or because it is the entire cluster and can use the zero flag:
+# Convert 128m... | XX XX => ... | 00 00
+$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc $size 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $size 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+echo
+echo == unaligned image tail cluster, allocation required ==
+
+# Write beyond backing file must COW
+# Backing file: 128m... | XX --
+# Active layer: 128m... | -- -- 00 --
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Writes at boundaries of (partial) cluster must not lose mid-cluster data
+# Backing file: 128m: ... | -- XX
+# Active layer: 128m: ... | 00 -- -- 00
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index d3b68e7..b86f074 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -282,4 +282,133 @@ read 1024/1024 bytes at offset 76800
 { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}]
+
+== unaligned image tail cluster, no allocation needed ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134219776, "depth": 0, "zero": true, "data": false}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134219776, "depth": 0, "zero": true, "data": false}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776
+wrote 1024/1024 bytes at offset 134218240
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134219776, "depth": 0, "zero": true, "data": false}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776
+wrote 2048/2048 bytes at offset 134217728
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134219776, "depth": 0, "zero": true, "data": false}]
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 1024/1024 bytes at offset 134218240
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 2048/2048 bytes at offset 134217728
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 1024/1024 bytes at offset 134218240
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 2048/2048 bytes at offset 134217728
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134218752
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Pattern verification failed at offset 134218240, 512 bytes
+read 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+1024/1024 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 0, "zero": true, "data": false},
+{ "start": 134217728, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+1024/1024 bytes allocated at offset 128 MiB
+read 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 0, "zero": true, "data": false},
+{ "start": 134217728, "length": 1024, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+
+== unaligned image tail cluster, allocation required ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134218752
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1536/1536 bytes at offset 134218240
+1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
 *** done
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 09/10] qcow2: Assert that cluster operations are aligned
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (7 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 08/10] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 10/10] qcow2: Discard/zero clusters by byte count Eric Blake
  2017-05-05 22:18 ` [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Max Reitz
  10 siblings, 0 replies; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
is only passed cluster-aligned start values; but we can further
tighten the assertion that the only unaligned end value is at EOF.

Recent commits have taken advantage of an unaligned tail cluster,
for both discard and write zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v12: no change
v11: reserved for blkdebug half of v10
v10: rebase to context
v9: rebase to master, by asserting that only tail cluster is unaligned
v7, v8: only earlier half of series submitted for 2.9
v6: avoid assertion on non-cluster-aligned image, use s->cluster_sectors
to avoid a shift, drop R-b
v5: no change
v4: new patch
---
 block/qcow2-cluster.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 78fbe34..9aab6dd 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1557,11 +1557,10 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);

-    /* The caller must cluster-align start; round end down except at EOF */
+    /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-    if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
-        end_offset = start_of_cluster(s, end_offset);
-    }
+    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);

     nb_clusters = size_to_clusters(s, end_offset - offset);

@@ -1644,9 +1643,17 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
                         int flags)
 {
     BDRVQcow2State *s = bs->opaque;
+    uint64_t end_offset;
     uint64_t nb_clusters;
     int ret;

+    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
+
+    /* Caller must pass aligned values, except at image end */
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+
     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
         return -ENOTSUP;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [PATCH v12 10/10] qcow2: Discard/zero clusters by byte count
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (8 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 09/10] qcow2: Assert that cluster operations are aligned Eric Blake
@ 2017-05-04  3:07 ` Eric Blake
  2017-05-05 22:18 ` [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Max Reitz
  10 siblings, 0 replies; 28+ messages in thread
From: Eric Blake @ 2017-05-04  3:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v12: minor tweaks suggested by Max
v11: reserved for blkdebug half of v10
v10: squash in fixup accounting for unaligned file end
v9: rebase to earlier changes, drop R-b
v7, v8: only earlier half of series submitted for 2.9
v6: rebase due to context
v5: s/count/byte/ to make the units obvious, and rework the math
to ensure no 32-bit integer overflow on large clusters
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
 block/qcow2.h          |  9 +++++----
 block/qcow2-cluster.c  | 42 ++++++++++++++++++++++--------------------
 block/qcow2-snapshot.c |  7 +++----
 block/qcow2.c          | 22 +++++++++-------------
 4 files changed, 39 insertions(+), 41 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 142f81b..e150e2a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -551,10 +551,11 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          int compressed_size);

 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags);
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard);
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags);

 int qcow2_expand_zero_clusters(BlockDriverState *bs,
                                BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9aab6dd..a47ac9f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1547,34 +1547,36 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + bytes;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
            end_offset == bs->total_sectors << BDRV_SECTOR_BITS);

-    nb_clusters = size_to_clusters(s, end_offset - offset);
+    nb_clusters = size_to_clusters(s, bytes);

     s->cache_discards = true;

     /* Each L2 table is handled by its own loop iteration */
     while (nb_clusters > 0) {
-        ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
-        if (ret < 0) {
+        cleared = discard_single_l2(bs, offset, nb_clusters, type,
+                                    full_discard);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
@@ -1639,16 +1641,15 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags)
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + bytes;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
@@ -1660,18 +1661,19 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
     }

     /* Each L2 table is handled by its own loop iteration */
-    nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
+    nb_clusters = size_to_clusters(s, bytes);

     s->cache_discards = true;

     while (nb_clusters > 0) {
-        ret = zero_single_l2(bs, offset, nb_clusters, flags);
-        if (ret < 0) {
+        cleared = zero_single_l2(bs, offset, nb_clusters, flags);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0324243..44243e0 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -440,10 +440,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)

     /* The VM state isn't needed any more in the active L1 table; in fact, it
      * hurts by causing expensive COW for the next snapshot. */
-    qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
-                           align_offset(sn->vm_state_size, s->cluster_size)
-                                >> BDRV_SECTOR_BITS,
-                           QCOW2_DISCARD_NEVER, false);
+    qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
+                          align_offset(sn->vm_state_size, s->cluster_size),
+                          QCOW2_DISCARD_NEVER, false);

 #ifdef DEBUG_ALLOC
     {
diff --git a/block/qcow2.c b/block/qcow2.c
index 3478bb6..ce571ea 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2512,7 +2512,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);

     /* Whatever is left can use real zero clusters */
-    ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
+    ret = qcow2_cluster_zeroize(bs, offset, count, flags);
     qemu_co_mutex_unlock(&s->lock);

     return ret;
@@ -2535,8 +2535,8 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
     }

     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
-                                 QCOW2_DISCARD_REQUEST, false);
+    ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST,
+                                false);
     qemu_co_mutex_unlock(&s->lock);
     return ret;
 }
@@ -2843,9 +2843,8 @@ fail:
 static int qcow2_make_empty(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t start_sector;
-    int sector_step = (QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size) /
-                       BDRV_SECTOR_SIZE);
+    uint64_t offset, end_offset;
+    int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size);
     int l1_clusters, ret = 0;

     l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2862,18 +2861,15 @@ static int qcow2_make_empty(BlockDriverState *bs)

     /* This fallback code simply discards every active cluster; this is slow,
      * but works in all cases */
-    for (start_sector = 0; start_sector < bs->total_sectors;
-         start_sector += sector_step)
-    {
+    end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
+    for (offset = 0; offset < end_offset; offset += step) {
         /* As this function is generally used after committing an external
          * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
          * default action for this kind of discard is to pass the discard,
          * which will ideally result in an actually smaller image file, as
          * is probably desired. */
-        ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
-                                     MIN(sector_step,
-                                         bs->total_sectors - start_sector),
-                                     QCOW2_DISCARD_SNAPSHOT, true);
+        ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
+                                    QCOW2_DISCARD_SNAPSHOT, true);
         if (ret < 0) {
             break;
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 01/10] qcow2: Use consistent switch indentation
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 01/10] qcow2: Use consistent switch indentation Eric Blake
@ 2017-05-05 19:42   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2017-05-05 19:42 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 6671 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> Fix a couple of inconsistent indentations, before an upcoming
> patch further tweaks the switch statements.  While at it, make
> some tweaks for shorter lines to keep checkpatch happy (best
> viewed with 'git diff -b').
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: new patch
> ---
>  block/qcow2-cluster.c  | 32 ++++++++---------
>  block/qcow2-refcount.c | 96 ++++++++++++++++++++++++++------------------------
>  2 files changed, 65 insertions(+), 63 deletions(-)

[...]

> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 4efca7e..a5a0076 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1117,70 +1117,72 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>                  goto fail;
>              }
> 
> -            for(j = 0; j < s->l2_size; j++) {
> +            for (j = 0; j < s->l2_size; j++) {
>                  uint64_t cluster_index;
> +                uint64_t offset_masked;
> 
>                  offset = be64_to_cpu(l2_table[j]);
>                  old_offset = offset;
> +                offset_masked = offset & L2E_OFFSET_MASK;

I'd rather rename "offset" to "entry" (or "l2_entry") and name this just
"offset". Much less confusing.

Also, I really wouldn't mind splitting the non-indentation changes off
into their own patch.

Tentative

Reviewed-by: Max Reitz <mreitz@redhat.com>

>                  offset &= ~QCOW_OFLAG_COPIED;
> 
>                  switch (qcow2_get_cluster_type(offset)) {
> -                    case QCOW2_CLUSTER_COMPRESSED:
> -                        nb_csectors = ((offset >> s->csize_shift) &
> -                                       s->csize_mask) + 1;
> -                        if (addend != 0) {
> -                            ret = update_refcount(bs,
> -                                (offset & s->cluster_offset_mask) & ~511,
> -                                nb_csectors * 512, abs(addend), addend < 0,
> -                                QCOW2_DISCARD_SNAPSHOT);
> -                            if (ret < 0) {
> -                                goto fail;
> -                            }
> -                        }
> -                        /* compressed clusters are never modified */
> -                        refcount = 2;
> -                        break;
> -
> -                    case QCOW2_CLUSTER_NORMAL:
> -                    case QCOW2_CLUSTER_ZERO:
> -                        if (offset_into_cluster(s, offset & L2E_OFFSET_MASK)) {
> -                            qcow2_signal_corruption(bs, true, -1, -1, "Data "
> -                                                    "cluster offset %#llx "
> -                                                    "unaligned (L2 offset: %#"
> -                                                    PRIx64 ", L2 index: %#x)",
> -                                                    offset & L2E_OFFSET_MASK,
> -                                                    l2_offset, j);
> -                            ret = -EIO;
> +                case QCOW2_CLUSTER_COMPRESSED:
> +                    nb_csectors = ((offset >> s->csize_shift) &
> +                                   s->csize_mask) + 1;
> +                    if (addend != 0) {
> +                        ret = update_refcount(bs,
> +                            (offset & s->cluster_offset_mask) & ~511,
> +                            nb_csectors * 512, abs(addend), addend < 0,
> +                            QCOW2_DISCARD_SNAPSHOT);
> +                        if (ret < 0) {
>                              goto fail;
>                          }
> +                    }
> +                    /* compressed clusters are never modified */
> +                    refcount = 2;
> +                    break;
> 
> -                        cluster_index = (offset & L2E_OFFSET_MASK) >> s->cluster_bits;
> -                        if (!cluster_index) {
> -                            /* unallocated */
> -                            refcount = 0;
> -                            break;
> -                        }
> -                        if (addend != 0) {
> -                            ret = qcow2_update_cluster_refcount(bs,
> +                case QCOW2_CLUSTER_NORMAL:
> +                case QCOW2_CLUSTER_ZERO:
> +                    if (offset_into_cluster(s, offset_masked)) {
> +                        qcow2_signal_corruption(bs, true, -1, -1, "Data "
> +                                                "cluster offset %#" PRIx64
> +                                                " unaligned (L2 offset: %#"
> +                                                PRIx64 ", L2 index: %#x)",
> +                                                offset_masked,
> +                                                l2_offset, j);
> +                        ret = -EIO;
> +                        goto fail;
> +                    }
> +
> +                    cluster_index = offset_masked >> s->cluster_bits;
> +                    if (!cluster_index) {
> +                        /* unallocated */
> +                        refcount = 0;
> +                        break;
> +                    }
> +                    if (addend != 0) {
> +                        ret = qcow2_update_cluster_refcount(bs,
>                                      cluster_index, abs(addend), addend < 0,
>                                      QCOW2_DISCARD_SNAPSHOT);
> -                            if (ret < 0) {
> -                                goto fail;
> -                            }
> -                        }
> -
> -                        ret = qcow2_get_refcount(bs, cluster_index, &refcount);
>                          if (ret < 0) {
>                              goto fail;
>                          }
> -                        break;
> +                    }
> 
> -                    case QCOW2_CLUSTER_UNALLOCATED:
> -                        refcount = 0;
> -                        break;
> +                    ret = qcow2_get_refcount(bs, cluster_index, &refcount);
> +                    if (ret < 0) {
> +                        goto fail;
> +                    }
> +                    break;
> 
> -                    default:
> -                        abort();
> +                case QCOW2_CLUSTER_UNALLOCATED:
> +                    refcount = 0;
> +                    break;
> +
> +                default:
> +                    abort();
>                  }
> 
>                  if (refcount == 1) {
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
@ 2017-05-05 20:06   ` Max Reitz
  2017-05-05 20:13     ` Eric Blake
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2017-05-05 20:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 5615 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> We had some conflicting documentation: a nice 8-way table that
> described all possible combinations of DATA, ZERO, and
> OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
> always meant raw data could be read directly.  Furthermore, the
> text refers a lot to bs->file, even though the interface was
> updated back in 67a0fd2a to let the driver pass back which BDS (not
> necessarily bs->file).

Not sure about my English skills here, but is this missing a verb? ("to
pass back which BDS...")

>                         As the 8-way table is the intended
> semantics, simplify the rest of the text to get rid of the
> confusion.
> 
> ALLOCATED is always set by the block layer for convenience (drivers
> do not have to worry about it). RAW is used only internally, but

Just one space after the period? How inconsistent! :-)

> by more than the raw driver.  Document these additional items on
> the driver callback.
> 
> Suggested-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: even more wording tweaks
> v11: reserved for blkdebug half of v10
> v10: new patch
> ---
>  include/block/block.h     | 35 +++++++++++++++++++----------------
>  include/block/block_int.h |  7 +++++++
>  2 files changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 862eb56..c8bec7d 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -120,29 +120,32 @@ typedef struct HDGeometry {
>  #define BDRV_REQUEST_MAX_BYTES (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)
> 
>  /*
> - * Allocation status flags
> - * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_get_block_status.
> - * BDRV_BLOCK_ZERO: sectors read as zero
> - * BDRV_BLOCK_OFFSET_VALID: sector stored as raw data in a file returned by
> - *                          bdrv_get_block_status.
> + * Allocation status flags for bdrv_get_block_status() and friends.
> + *
> + * Public flags:
> + * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> + * BDRV_BLOCK_ZERO: offset reads as zero
> + * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
> - *                       layer (as opposed to the backing file)
> - * BDRV_BLOCK_RAW: used internally to indicate that the request
> - *                 was answered by the raw driver and that one
> - *                 should look in bs->file directly.
> + *                       layer (short for DATA || ZERO), set by block layer
>   *
> - * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
> - * bs->file where sector data can be read from as raw data.
> + * Internal flag:
> + * BDRV_BLOCK_RAW: used internally to indicate that the request was
> + *                 answered by a passthrough driver such as raw and that the

s/passthrough/filter/? But I'm not even sure myself. Well, "passthrough"
is a safe bet, so let's just go with it.

With the commit message fixed or a “no it's fine”:

Reviewed-by: Max Reitz <mreitz@redhat.com>

> + *                 block layer should recompute the answer from bs->file.
>   *
> - * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
> + * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 (BDRV_BLOCK_OFFSET_MASK)
> + * represent the offset in the returned BDS that is allocated for the
> + * corresponding raw data; however, whether that offset actually contains
> + * data also depends on BDRV_BLOCK_DATA and BDRV_BLOCK_ZERO, as follows:
>   *
>   * DATA ZERO OFFSET_VALID
> - *  t    t        t       sectors read as zero, bs->file is zero at offset
> - *  t    f        t       sectors read as valid from bs->file at offset
> - *  f    t        t       sectors preallocated, read as zero, bs->file not
> + *  t    t        t       sectors read as zero, returned file is zero at offset
> + *  t    f        t       sectors read as valid from file at offset
> + *  f    t        t       sectors preallocated, read as zero, returned file not
>   *                        necessarily zero at offset
>   *  f    f        t       sectors preallocated but read from backing_hd,
> - *                        bs->file contains garbage at offset
> + *                        returned file contains garbage at offset
>   *  t    t        f       sectors preallocated, read as zero, unknown offset
>   *  t    f        f       sectors read from unknown file or offset
>   *  f    t        f       not allocated or unknown offset, read as zero
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 8773940..1fdfff7 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -165,6 +165,13 @@ struct BlockDriver {
>          int64_t offset, int count, BdrvRequestFlags flags);
>      int coroutine_fn (*bdrv_co_pdiscard)(BlockDriverState *bs,
>          int64_t offset, int count);
> +
> +    /*
> +     * Building block for bdrv_block_status[_above]. The driver should
> +     * answer only according to the current layer, and should not
> +     * set BDRV_BLOCK_ALLOCATED, but may set BDRV_BLOCK_RAW.  See block.h
> +     * for the meaning of _DATA, _ZERO, and _OFFSET_VALID.
> +     */
>      int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, int *pnum,
>          BlockDriverState **file);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings
  2017-05-05 20:06   ` Max Reitz
@ 2017-05-05 20:13     ` Eric Blake
  2017-05-05 20:23       ` Max Reitz
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-05 20:13 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]

On 05/05/2017 03:06 PM, Max Reitz wrote:
> On 04.05.2017 05:07, Eric Blake wrote:
>> We had some conflicting documentation: a nice 8-way table that
>> described all possible combinations of DATA, ZERO, and
>> OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
>> always meant raw data could be read directly.  Furthermore, the
>> text refers a lot to bs->file, even though the interface was
>> updated back in 67a0fd2a to let the driver pass back which BDS (not
>> necessarily bs->file).
> 
> Not sure about my English skills here, but is this missing a verb? ("to
> pass back which BDS...")

maybe s/which/a specific/

> 
>>                         As the 8-way table is the intended
>> semantics, simplify the rest of the text to get rid of the
>> confusion.
>>
>> ALLOCATED is always set by the block layer for convenience (drivers
>> do not have to worry about it). RAW is used only internally, but
> 
> Just one space after the period? How inconsistent! :-)

But do commit messages really count?  :)


>> + * Internal flag:
>> + * BDRV_BLOCK_RAW: used internally to indicate that the request was
>> + *                 answered by a passthrough driver such as raw and that the
> 
> s/passthrough/filter/? But I'm not even sure myself. Well, "passthrough"
> is a safe bet, so let's just go with it.

Calling 'raw' a filter driver is a bit weird - but in one sense, it is a
no-op filter (filter the protocol layer into the format layer by doing
nothing).  Meanwhile 'commit' certainly sounds like more of a filter
than a passthrough.  I could go either way, and filter is slightly
shorter.  If there's a real reason to respin the series, 'filter' seems
reasonable if we're worried about line length, otherwise I'm just as
inclined to leave it alone.

> 
> With the commit message fixed or a “no it's fine”:
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings
  2017-05-05 20:13     ` Eric Blake
@ 2017-05-05 20:23       ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2017-05-05 20:23 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 2780 bytes --]

On 05.05.2017 22:13, Eric Blake wrote:
> On 05/05/2017 03:06 PM, Max Reitz wrote:
>> On 04.05.2017 05:07, Eric Blake wrote:
>>> We had some conflicting documentation: a nice 8-way table that
>>> described all possible combinations of DATA, ZERO, and
>>> OFFSET_VALID, contrasted with text that implied that OFFSET_VALID
>>> always meant raw data could be read directly.  Furthermore, the
>>> text refers a lot to bs->file, even though the interface was
>>> updated back in 67a0fd2a to let the driver pass back which BDS (not
>>> necessarily bs->file).
>>
>> Not sure about my English skills here, but is this missing a verb? ("to
>> pass back which BDS...")
> 
> maybe s/which/a specific/

Or that, yes. :-)

> 
>>
>>>                         As the 8-way table is the intended
>>> semantics, simplify the rest of the text to get rid of the
>>> confusion.
>>>
>>> ALLOCATED is always set by the block layer for convenience (drivers
>>> do not have to worry about it). RAW is used only internally, but
>>
>> Just one space after the period? How inconsistent! :-)
> 
> But do commit messages really count?  :)

It's a critical bug, I'm telling you!!!

>>> + * Internal flag:
>>> + * BDRV_BLOCK_RAW: used internally to indicate that the request was
>>> + *                 answered by a passthrough driver such as raw and that the
>>
>> s/passthrough/filter/? But I'm not even sure myself. Well, "passthrough"
>> is a safe bet, so let's just go with it.
> 
> Calling 'raw' a filter driver is a bit weird - but in one sense, it is a
> no-op filter (filter the protocol layer into the format layer by doing
> nothing).  Meanwhile 'commit' certainly sounds like more of a filter
> than a passthrough.  I could go either way, and filter is slightly
> shorter.  If there's a real reason to respin the series, 'filter' seems
> reasonable if we're worried about line length, otherwise I'm just as
> inclined to leave it alone.

raw certainly is a filter driver; the thing I wasn't sure about is that
I'm not sure whether filter drivers are required to set this flag. But
neither the comment nor the code require it necessarily, so using
"filter" instead of "passthrough" should be OK.

The main reason for using "filter" over "passthrough" is that "filter"
is a "real" class of block drivers (just "real" in the sense that we
actually only have child-less protocol drivers and non-protocol drivers
that do have children; further dividing into "format" and "filter" is
just a convention).

But it should be clear anyway, so I don't mind either way. Leaving it as
it is certainly is simpler.

Max

> 
>>
>> With the commit message fixed or a “no it's fine”:
>>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 03/10] qcow2: Correctly report status of preallocated zero clusters
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 03/10] qcow2: Correctly report status of preallocated zero clusters Eric Blake
@ 2017-05-05 20:24   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2017-05-05 20:24 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 1294 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> We were throwing away the preallocation information associated with
> zero clusters.  But we should be matching the well-defined semantics
> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
> while still reminding the user that reading from that offset is
> likely to read garbage.
> 
> count_contiguous_clusters_by_type() is now used only for unallocated
> cluster runs, hence it gets renamed and tightened.
> 
> Making this change lets us see which portions of an image are zero
> but preallocated, when using qemu-img map --output=json.  The
> --output=human side intentionally ignores all zero clusters, whether
> or not they are preallocated.
> 
> The fact that there is no change to qemu-iotests './check -qcow2'
> merely means that we aren't yet testing this aspect of qemu-img;
> a later patch will add a test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: rename helper function
> v11: reserved for blkdebug half of v10
> v10: new patch
> ---
>  block/qcow2-cluster.c | 45 +++++++++++++++++++++++++++++++++++----------
>  1 file changed, 35 insertions(+), 10 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 04/10] qcow2: Make distinction between zero cluster types obvious
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 04/10] qcow2: Make distinction between zero cluster types obvious Eric Blake
@ 2017-05-05 20:51   ` Max Reitz
  2017-05-06 20:30     ` Eric Blake
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2017-05-05 20:51 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 6246 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> Treat plain zero clusters differently from allocated ones, so that
> we can simplify the logic of checking whether an offset is present.
> Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
> QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.
> 
> I tried to arrange the enum so that we could use
> 'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
> 'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
> I didn't actually end up taking advantage of the layout.
> 
> In many cases, this leads to simpler code, by properly combining
> cases (sometimes, both zero types pair together, other times,
> plain zero is more like unallocated while allocated zero is more
> like normal).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: new patch
> ---
>  block/qcow2.h          |  8 +++++--
>  block/qcow2-cluster.c  | 65 ++++++++++++++++++--------------------------------
>  block/qcow2-refcount.c | 40 +++++++++++++------------------
>  block/qcow2.c          |  9 ++++---
>  4 files changed, 51 insertions(+), 71 deletions(-)

I have to admit I was rather skeptic of this idea at first (because I
thought we would have more places which treat both ZERO types the same
than those that separate it), but you have comprehensively proven me wrong.

Some nit picks below, I'll leave it completely up to you whether you
want to address them:

Reviewed-by: Max Reitz <mreitz@redhat.com>

[...]

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index f3bfce6..14e2086 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c

[...]

> @@ -558,52 +557,32 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>      assert(nb_clusters <= INT_MAX);
> 
>      ret = qcow2_get_cluster_type(*cluster_offset);
> +    if (s->qcow_version < 3 && (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
> +                                ret == QCOW2_CLUSTER_ZERO_ALLOC)) {
> +        qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
> +                                " in pre-v3 image (L2 offset: %#" PRIx64
> +                                ", L2 index: %#x)", l2_offset, l2_index);
> +        ret = -EIO;
> +        goto fail;
> +    }
>      switch (ret) {
>      case QCOW2_CLUSTER_COMPRESSED:
>          /* Compressed clusters can only be processed one by one */
>          c = 1;
>          *cluster_offset &= L2E_COMPRESSED_OFFSET_SIZE_MASK;
>          break;
> -    case QCOW2_CLUSTER_ZERO:
> -        if (s->qcow_version < 3) {
> -            qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
> -                                    " in pre-v3 image (L2 offset: %#" PRIx64
> -                                    ", L2 index: %#x)", l2_offset, l2_index);
> -            ret = -EIO;
> -            goto fail;
> -        }
> -        /* Distinguish between pure zero clusters and pre-allocated ones */
> -        if (*cluster_offset & L2E_OFFSET_MASK) {
> -            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> -                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);
> -            *cluster_offset &= L2E_OFFSET_MASK;
> -            if (offset_into_cluster(s, *cluster_offset)) {
> -                qcow2_signal_corruption(bs, true, -1, -1,
> -                                        "Preallocated zero cluster offset %#"
> -                                        PRIx64 " unaligned (L2 offset: %#"
> -                                        PRIx64 ", L2 index: %#x)",
> -                                        *cluster_offset, l2_offset, l2_index);
> -                ret = -EIO;
> -                goto fail;
> -            }
> -        } else {
> -            c = count_contiguous_clusters_unallocated(nb_clusters,
> -                                                      &l2_table[l2_index],
> -                                                      QCOW2_CLUSTER_ZERO);
> -            *cluster_offset = 0;
> -        }
> -        break;
> +    case QCOW2_CLUSTER_ZERO_PLAIN:
>      case QCOW2_CLUSTER_UNALLOCATED:
>          /* how many empty clusters ? */
>          c = count_contiguous_clusters_unallocated(nb_clusters,
> -                                                  &l2_table[l2_index],
> -                                                  QCOW2_CLUSTER_UNALLOCATED);
> +                                                  &l2_table[l2_index], ret);

Nit pick: Using ret here is a bit weird (because it's such a meaningless
name). It would be good if we had a separate cluster_type variable.

>          *cluster_offset = 0;
>          break;
> +    case QCOW2_CLUSTER_ZERO_ALLOC:
>      case QCOW2_CLUSTER_NORMAL:
>          /* how many allocated clusters ? */
>          c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> -                &l2_table[l2_index], QCOW_OFLAG_ZERO);
> +                                      &l2_table[l2_index], QCOW_OFLAG_ZERO);
>          *cluster_offset &= L2E_OFFSET_MASK;
>          if (offset_into_cluster(s, *cluster_offset)) {
>              qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset %#"

Well, preallocated zero clusters are not exactly data clusters... Not
that any user cared, but s/Data cluster/Cluster allocation/ would be
more correct.

By the way, allow me to state just how much I love this hunk: Very much.
Looks great! It gets a place on my list of favorite hunks of this year
at least.

[...]

> @@ -1760,7 +1740,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>              int cluster_type = qcow2_get_cluster_type(l2_entry);>              bool preallocated = offset != 0;

I could get behind removing this variable and replacing all
"if (!preallocated)" instances by
"if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN)". Up to you, though.

Max

> 
> -            if (cluster_type != QCOW2_CLUSTER_ZERO) {
> +            if (cluster_type != QCOW2_CLUSTER_ZERO_PLAIN &&
> +                cluster_type != QCOW2_CLUSTER_ZERO_ALLOC) {
>                  continue;
>              }
>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 05/10] qcow2: Optimize zero_single_l2() to minimize L2 churn
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 05/10] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
@ 2017-05-05 20:55   ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2017-05-05 20:55 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 3695 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> Similar to discard_single_l2(), we should try to avoid dirtying
> the L2 cache when the cluster we are changing already has the
> right characteristics.
> 
> Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
> is a requirement to unallocate a cluster (this is because the block
> layer clears that flag if discard.* flags during open requested that
> we never punch holes - see the conversation around commit 170f4b2e,
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
> Therefore, this patch can only reuse a zero cluster as-is if either
> unmapping is not requested, or if the zero cluster was not associated
> with an allocation.
> 
> Technically, there are some cases where an unallocated cluster
> already reads as all zeroes (namely, when there is no backing file
> [easy: check bs->backing], or when the backing file also reads as
> zeroes [harder: we can't check bdrv_get_block_status since we are
> already holding the lock]), where the guest would not immediately see
> a difference if we left that cluster unallocated.  But if the user
> did not request unmapping, leaving an unallocated cluster is wrong;
> and even if the user DID request unmapping, keeping a cluster
> unallocated risks a subtle semantic change of guest-visible contents
> if a backing file is later added, and it is not worth auditing
> whether all internal uses such as mirror properly avoid an unmap
> request.  Thus, this patch is intentionally limited to just clusters
> that are already marked as zero.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: store cluster type in temporary
> v11: reserved for blkdebug half of v10
> v10: new patch, replacing earlier attempt to use unallocated clusters,
> and ditching any optimization of v2 files
> ---
>  block/qcow2-cluster.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 14e2086..78fbe34 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1599,6 +1599,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>      int l2_index;
>      int ret;
>      int i;
> +    bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
> 
>      ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
>      if (ret < 0) {
> @@ -1611,12 +1612,22 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
> 
>      for (i = 0; i < nb_clusters; i++) {
>          uint64_t old_offset;
> +        int cluster_type;

Hm, why doesn't this enum have a name yet... Well, if only *someone*
could address that. */me ducks*

Max

> 
>          old_offset = be64_to_cpu(l2_table[l2_index + i]);
> 
> -        /* Update L2 entries */
> +        /*
> +         * Minimize L2 changes if the cluster already reads back as
> +         * zeroes with correct allocation.
> +         */
> +        cluster_type = qcow2_get_cluster_type(old_offset);
> +        if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN ||
> +            (cluster_type == QCOW2_CLUSTER_ZERO_ALLOC && !unmap)) {
> +            continue;
> +        }
> +
>          qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
> -        if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
> +        if (cluster_type == QCOW2_CLUSTER_COMPRESSED || unmap) {
>              l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>              qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>          } else {
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map Eric Blake
@ 2017-05-05 20:58   ` Max Reitz
  2017-05-05 21:06     ` Eric Blake
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2017-05-05 20:58 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> Although _filter_qemu_img_map documents that it scrubs offsets, it
> was only doing so for human mode.  Of the existing tests using the
> filter (97, 122, 150, 154, 176), two of them are affected, but it
> does not hurt the validity of the tests to not require particular
> mappings (another test, 66, uses offsets but intentionally does not
> pass through _filter_qemu_img_map, because it checks that offsets
> are unchanged before and after an operation).
> 
> Another justification for this patch is that it will allow a future
> patch to utilize 'qemu-img map --output=json' to check the status of
> preallocated zero clusters without regards to the mapping (since
> the qcow2 mapping can be very sensitive to the chosen cluster size,
> when preallocation is not in use).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: new patch
> ---
>  tests/qemu-iotests/common.filter |  4 +++-
>  tests/qemu-iotests/122.out       | 16 ++++++++--------
>  tests/qemu-iotests/154.out       | 30 +++++++++++++++---------------
>  3 files changed, 26 insertions(+), 24 deletions(-)
> 
> diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
> index f58548d..2c14f15 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -152,10 +152,12 @@ _filter_img_info()
>          -e "/log_size: [0-9]\\+/d"
>  }
> 
> -# filter out offsets and file names from qemu-img map
> +# filter out offsets and file names from qemu-img map; good for both
> +# human and json output
>  _filter_qemu_img_map()
>  {
>      sed -e 's/\([0-9a-fx]* *[0-9a-fx]* *\)[0-9a-fx]* */\1/g' \
> +	-e 's/"offset": [0-9]\+/"offset": OFFSET/g' \>          -e 's/Mapped to *//' | _filter_testdir | _filter_imgfmt

This makes my OCD very sad.

With the alignment fixed:

Reviewed-by: Max Reitz <mreitz@redhat.com>

(Without this fixed: Will-change-when-applying-and-then-add-the-R-b)

>  }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map
  2017-05-05 20:58   ` Max Reitz
@ 2017-05-05 21:06     ` Eric Blake
  2017-05-05 21:07       ` Max Reitz
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Blake @ 2017-05-05 21:06 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 1567 bytes --]

On 05/05/2017 03:58 PM, Max Reitz wrote:
> On 04.05.2017 05:07, Eric Blake wrote:
>> Although _filter_qemu_img_map documents that it scrubs offsets, it
>> was only doing so for human mode.  Of the existing tests using the
>> filter (97, 122, 150, 154, 176), two of them are affected, but it
>> does not hurt the validity of the tests to not require particular
>> mappings (another test, 66, uses offsets but intentionally does not
>> pass through _filter_qemu_img_map, because it checks that offsets
>> are unchanged before and after an operation).
>>

>>  _filter_qemu_img_map()
>>  {
>>      sed -e 's/\([0-9a-fx]* *[0-9a-fx]* *\)[0-9a-fx]* */\1/g' \
>> +	-e 's/"offset": [0-9]\+/"offset": OFFSET/g' \>          -e 's/Mapped to *//' | _filter_testdir | _filter_imgfmt

Whoa - mail corruption.

> 
> This makes my OCD very sad.

The real culprit? TAB damage.  I guess this is one of the files that
emacs thinks should be indented with TAB instead of space, because the
file itself looks aligned, only the patch is botched.

> 
> With the alignment fixed:
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
> (Without this fixed: Will-change-when-applying-and-then-add-the-R-b)

Much appreciated, and I await the final verdict on whether I have to
respin (you've pointed out several cleanups I can make), or whether you
take the fixed version of these patches and let me write a few more
followups.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map
  2017-05-05 21:06     ` Eric Blake
@ 2017-05-05 21:07       ` Max Reitz
  0 siblings, 0 replies; 28+ messages in thread
From: Max Reitz @ 2017-05-05 21:07 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 1583 bytes --]

On 05.05.2017 23:06, Eric Blake wrote:
> On 05/05/2017 03:58 PM, Max Reitz wrote:
>> On 04.05.2017 05:07, Eric Blake wrote:
>>> Although _filter_qemu_img_map documents that it scrubs offsets, it
>>> was only doing so for human mode.  Of the existing tests using the
>>> filter (97, 122, 150, 154, 176), two of them are affected, but it
>>> does not hurt the validity of the tests to not require particular
>>> mappings (another test, 66, uses offsets but intentionally does not
>>> pass through _filter_qemu_img_map, because it checks that offsets
>>> are unchanged before and after an operation).
>>>
> 
>>>  _filter_qemu_img_map()
>>>  {
>>>      sed -e 's/\([0-9a-fx]* *[0-9a-fx]* *\)[0-9a-fx]* */\1/g' \
>>> +	-e 's/"offset": [0-9]\+/"offset": OFFSET/g' \>          -e 's/Mapped to *//' | _filter_testdir | _filter_imgfmt
> 
> Whoa - mail corruption.
> 
>>
>> This makes my OCD very sad.
> 
> The real culprit? TAB damage.  I guess this is one of the files that
> emacs thinks should be indented with TAB instead of space, because the

>emacs

Found the issue right there.

O:-)

Max

> file itself looks aligned, only the patch is botched.
> 
>>
>> With the alignment fixed:
>>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>
>> (Without this fixed: Will-change-when-applying-and-then-add-the-R-b)
> 
> Much appreciated, and I await the final verdict on whether I have to
> respin (you've pointed out several cleanups I can make), or whether you
> take the fixed version of these patches and let me write a few more
> followups.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
@ 2017-05-05 21:24   ` Max Reitz
  2017-05-05 22:29   ` Max Reitz
  1 sibling, 0 replies; 28+ messages in thread
From: Max Reitz @ 2017-05-05 21:24 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 1786 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> No tests were covering write zeroes with unmap.  Additionally,
> I needed to prove that my previous patches for correct status
> reporting and write zeroes optimizations actually had an impact.
> 
> The test works for cluster_size between 8k and 2M (for smaller
> sizes, it fails because our allocation patterns are not contiguous
> with small clusters - in part, the largest consecutive allocation
> we tend to get is often bounded by the size covered by one L2
> table).
> 
> Note that testing for zero clusters is tricky: 'qemu-io map'
> reports whether data comes from the current layer of the image
> (useful for sniffing out which regions of the file have
> QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
> while 'qemu-img map' sees "zero":true for both unallocated and
> zero clusters for any qcow2 with no backing layer (so less useful
> at detecting true zero clusters), but reliably shows mappings.
> So we have to rely on both queries side-by-side at each point of
> the test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: probe the map in more places, to make test easier to follow
> v11: reserved for blkdebug half of v10
> v10: drop any changes to v2 files, rewrite test to work with updates
> earlier in the series, add a blkdebug probe
> v9: new patch
> ---
>  tests/qemu-iotests/179     | 132 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/179.out | 160 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 293 insertions(+)
>  create mode 100755 tests/qemu-iotests/179
>  create mode 100644 tests/qemu-iotests/179.out

Reviewed-by: Max Reitz <mreitz@redhat.com>

TIL qemu-io -c w.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 08/10] qcow2: Optimize write zero of unaligned tail cluster
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 08/10] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
@ 2017-05-05 22:06   ` Max Reitz
  2017-05-05 22:41     ` Eric Blake
  0 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2017-05-05 22:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 2940 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> We've already improved discards to operate efficiently on the tail
> of an unaligned qcow2 image; it's time to make a similar improvement
> to write zeroes.  The special case is only valid at the tail
> cluster of a file, where we must recognize that any sectors beyond
> the image end would implicitly read as zero, and therefore should
> not penalize our logic for widening a partial cluster into writing
> the whole cluster as zero.
> 
> However, note that for now, the special case of end-of-file is only
> recognized if there is no backing file, or if the backing file has
> the same length; that's because when the backing file is shorter
> than the active layer, we don't have code in place to recognize
> that reads of a sector unallocated at the top and beyond the backing
> end-of-file are implicitly zero.  It's not much of a real loss,
> because most people don't use images that aren't cluster-aligned,
> or where the active layer is a different size than the backing
> layer (especially where the difference falls within a single cluster).
> 
> Update test 154 to cover the new scenarios, using two images of
> intentionally differing length.
> 
> While at it, fix the test to gracefully skip when run as
> ./check -qcow2 -o compat=0.10 154
> since the older format lacks zero clusters already required earlier
> in the test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: fix testsuite problems, document shortcoming of differing
> v11: reserved for blkdebug half of v10
> size of backing file
> 
> v10: rebase to better reporting of preallocated zero clusters
> v9: new patch
> ---
>  block/qcow2.c              |   7 ++
>  tests/qemu-iotests/154     | 160 ++++++++++++++++++++++++++++++++++++++++++++-
>  tests/qemu-iotests/154.out | 129 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 294 insertions(+), 2 deletions(-)

[...]

> diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
> index 7ca7219..687b8f3 100755
> --- a/tests/qemu-iotests/154
> +++ b/tests/qemu-iotests/154

[...]

> @@ -299,6 +302,159 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | _filter_qemu_io
> 
>  $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> 
> +echo
> +echo == unaligned image tail cluster, no allocation needed ==

[...]

> +# A preallocated cluster maintains its allocation, whether it stays as
> +# data due to a partial write:
> +# Convert 128m... | XX XX => ... | XX 00
> +_make_test_img $((size + 1024))
> +$QEMU_IO -c "write -P 1 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io

s/\.base//, I suppose?

(You should read your reference output. "Pattern verification failed" is
never good. ;-))

With that (and the reference output) fixed:

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests]
  2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
                   ` (9 preceding siblings ...)
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 10/10] qcow2: Discard/zero clusters by byte count Eric Blake
@ 2017-05-05 22:18 ` Max Reitz
  2017-05-05 22:43   ` Eric Blake
  10 siblings, 1 reply; 28+ messages in thread
From: Max Reitz @ 2017-05-05 22:18 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> I've collected several improvements for qcow2 zero-cluster handling.
> 
> Available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v12
> 
> Marked as v12 for "hysterical raisins", since it it the half of
> v10 [1] that was not resubmitted as v11 [2].
> 
> Depends on Max's block tree:
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00641.html
> and on Max's qcow2 cleanups:
> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00689.html
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05227.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05896.html
> 
> Changes since last posting:
> - lots of tweaks to resolve Max's review findings, including
> rewriting my additions to test 154
> - a new patch splitting QCOW2_CLUSTER_ZERO that fell out from
> my review of Max's work
> - defer any optimizations of a backing file with different length
> until later (I will still post an RFC patch to explore what
> optimizations a BDRV_BLOCK_EOF would allow, but didn't want to
> hold up this series any further)

I have given an R-b for every patch, so I could apply the series as-is
(with the really minor fixes we have talked about, and the thing in
patch 8), and I would be more or less OK with that. But I did have my
fair share of nit picks and before I discard all of them, I'd like to
know for sure that you don't intend to address them, or address them in
a follow-up.

I'm saying this because some of the issues I had do not really lend
themselves nicely to a follow-up; for instance the
s/Data cluster/Cluster allocation/ for patch 4.

From my perspective, it's not much more difficult to review a respin of
this series which just these minor points changed and maybe some very
small patches (the follow-up, basically) added to it.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap
  2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
  2017-05-05 21:24   ` Max Reitz
@ 2017-05-05 22:29   ` Max Reitz
  1 sibling, 0 replies; 28+ messages in thread
From: Max Reitz @ 2017-05-05 22:29 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 2570 bytes --]

On 04.05.2017 05:07, Eric Blake wrote:
> No tests were covering write zeroes with unmap.  Additionally,
> I needed to prove that my previous patches for correct status
> reporting and write zeroes optimizations actually had an impact.
> 
> The test works for cluster_size between 8k and 2M (for smaller
> sizes, it fails because our allocation patterns are not contiguous
> with small clusters - in part, the largest consecutive allocation
> we tend to get is often bounded by the size covered by one L2
> table).
> 
> Note that testing for zero clusters is tricky: 'qemu-io map'
> reports whether data comes from the current layer of the image
> (useful for sniffing out which regions of the file have
> QCOW_OFLAG_ZERO) - but doesn't show which clusters have mappings;
> while 'qemu-img map' sees "zero":true for both unallocated and
> zero clusters for any qcow2 with no backing layer (so less useful
> at detecting true zero clusters), but reliably shows mappings.
> So we have to rely on both queries side-by-side at each point of
> the test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v12: probe the map in more places, to make test easier to follow
> v11: reserved for blkdebug half of v10
> v10: drop any changes to v2 files, rewrite test to work with updates
> earlier in the series, add a blkdebug probe
> v9: new patch
> ---
>  tests/qemu-iotests/179     | 132 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/179.out | 160 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 293 insertions(+)
>  create mode 100755 tests/qemu-iotests/179
>  create mode 100644 tests/qemu-iotests/179.out

[...]

> +wrote 2097152/2097152 bytes at offset 20971520
> +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 2097152/2097152 bytes at offset 20971520
> +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 2097152/2097152 bytes at offset 29360128
> +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +Failed to flush the L2 table cache: Input/output error
> +Failed to flush the refcount block cache: Input/output error
> +wrote 2097152/2097152 bytes at offset 0
> +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)

Maybe we should drop this last test that blkdebug works. It fails (for
me) on tmpfs (because the write itself fails there), and with -c unsafe
(because it doesn't flush the refcount cache). I'd be fine with just
believing blkdebug that it does its job...

Max

> +*** done


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 08/10] qcow2: Optimize write zero of unaligned tail cluster
  2017-05-05 22:06   ` Max Reitz
@ 2017-05-05 22:41     ` Eric Blake
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Blake @ 2017-05-05 22:41 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]

On 05/05/2017 05:06 PM, Max Reitz wrote:
> On 04.05.2017 05:07, Eric Blake wrote:
>> We've already improved discards to operate efficiently on the tail
>> of an unaligned qcow2 image; it's time to make a similar improvement
>> to write zeroes.  The special case is only valid at the tail
>> cluster of a file, where we must recognize that any sectors beyond
>> the image end would implicitly read as zero, and therefore should
>> not penalize our logic for widening a partial cluster into writing
>> the whole cluster as zero.
>>

>>  $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>>
>> +echo
>> +echo == unaligned image tail cluster, no allocation needed ==
> 
> [...]
> 
>> +# A preallocated cluster maintains its allocation, whether it stays as
>> +# data due to a partial write:
>> +# Convert 128m... | XX XX => ... | XX 00
>> +_make_test_img $((size + 1024))
>> +$QEMU_IO -c "write -P 1 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
> 
> s/\.base//, I suppose?

D'oh. Serves me right for renaming which file I was working on.

> 
> (You should read your reference output. "Pattern verification failed" is
> never good. ;-))

Looks like I get to spin a v13 then.

> 
> With that (and the reference output) fixed:
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests]
  2017-05-05 22:18 ` [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Max Reitz
@ 2017-05-05 22:43   ` Eric Blake
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Blake @ 2017-05-05 22:43 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 1754 bytes --]

On 05/05/2017 05:18 PM, Max Reitz wrote:
> On 04.05.2017 05:07, Eric Blake wrote:
>> I've collected several improvements for qcow2 zero-cluster handling.
>>
>> Available as a tag at:
>> git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v12
>>
>> Marked as v12 for "hysterical raisins", since it it the half of
>> v10 [1] that was not resubmitted as v11 [2].
>>
>> Depends on Max's block tree:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00641.html
>> and on Max's qcow2 cleanups:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg00689.html
>>
>> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05227.html
>> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05896.html
>>

> I have given an R-b for every patch, so I could apply the series as-is
> (with the really minor fixes we have talked about, and the thing in
> patch 8), and I would be more or less OK with that. But I did have my
> fair share of nit picks and before I discard all of them, I'd like to
> know for sure that you don't intend to address them, or address them in
> a follow-up.
> 
> I'm saying this because some of the issues I had do not really lend
> themselves nicely to a follow-up; for instance the
> s/Data cluster/Cluster allocation/ for patch 4.
> 
> From my perspective, it's not much more difficult to review a respin of
> this series which just these minor points changed and maybe some very
> small patches (the follow-up, basically) added to it.

v13 coming soon to a list near you (but maybe not until Monday) - let's
see if it's the lucky one!

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [PATCH v12 04/10] qcow2: Make distinction between zero cluster types obvious
  2017-05-05 20:51   ` Max Reitz
@ 2017-05-06 20:30     ` Eric Blake
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Blake @ 2017-05-06 20:30 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf

[-- Attachment #1: Type: text/plain, Size: 3957 bytes --]

On 05/05/2017 03:51 PM, Max Reitz wrote:
> On 04.05.2017 05:07, Eric Blake wrote:
>> Treat plain zero clusters differently from allocated ones, so that
>> we can simplify the logic of checking whether an offset is present.
>> Do this by splitting QCOW2_CLUSTER_ZERO into two new enums,
>> QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC.
>>
>> I tried to arrange the enum so that we could use
>> 'ret <= QCOW2_CLUSTER_ZERO_PLAIN' for all unallocated types, and
>> 'ret >= QCOW2_CLUSTER_ZERO_ALLOC' for allocated types, although
>> I didn't actually end up taking advantage of the layout.
>>
>> In many cases, this leads to simpler code, by properly combining
>> cases (sometimes, both zero types pair together, other times,
>> plain zero is more like unallocated while allocated zero is more
>> like normal).
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>> @@ -558,52 +557,32 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>>      assert(nb_clusters <= INT_MAX);
>>
>>      ret = qcow2_get_cluster_type(*cluster_offset);
>> +    if (s->qcow_version < 3 && (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
>> +                                ret == QCOW2_CLUSTER_ZERO_ALLOC)) {
>> +        qcow2_signal_corruption(bs, true, -1, -1, "Zero cluster entry found"
>> +                                " in pre-v3 image (L2 offset: %#" PRIx64
>> +                                ", L2 index: %#x)", l2_offset, l2_index);
>> +        ret = -EIO;
>> +        goto fail;
>> +    }
...
>> +    case QCOW2_CLUSTER_ZERO_PLAIN:
>>      case QCOW2_CLUSTER_UNALLOCATED:
>>          /* how many empty clusters ? */
>>          c = count_contiguous_clusters_unallocated(nb_clusters,
>> -                                                  &l2_table[l2_index],
>> -                                                  QCOW2_CLUSTER_UNALLOCATED);
>> +                                                  &l2_table[l2_index], ret);
> 
> Nit pick: Using ret here is a bit weird (because it's such a meaningless
> name). It would be good if we had a separate cluster_type variable.

qcow2_get_cluster_offset() returns the cluster type on success, and
-errno on failure.  So 'ret' actually makes some sense: it really is the
value we are about to return.  But it may also work to have a separate
variable up front, then assign ret = cluster_type at the end; I'll play
with it and see which one looks better.

> 
>>          *cluster_offset = 0;
>>          break;
>> +    case QCOW2_CLUSTER_ZERO_ALLOC:
>>      case QCOW2_CLUSTER_NORMAL:
>>          /* how many allocated clusters ? */
>>          c = count_contiguous_clusters(nb_clusters, s->cluster_size,
>> -                &l2_table[l2_index], QCOW_OFLAG_ZERO);
>> +                                      &l2_table[l2_index], QCOW_OFLAG_ZERO);
>>          *cluster_offset &= L2E_OFFSET_MASK;
>>          if (offset_into_cluster(s, *cluster_offset)) {
>>              qcow2_signal_corruption(bs, true, -1, -1, "Data cluster offset %#"
> 
> Well, preallocated zero clusters are not exactly data clusters... Not
> that any user cared, but s/Data cluster/Cluster allocation/ would be
> more correct.

Good idea.

> 
> By the way, allow me to state just how much I love this hunk: Very much.
> Looks great! It gets a place on my list of favorite hunks of this year
> at least.
> 
> [...]
> 
>> @@ -1760,7 +1740,8 @@ static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
>>              int cluster_type = qcow2_get_cluster_type(l2_entry);>              bool preallocated = offset != 0;
> 
> I could get behind removing this variable and replacing all
> "if (!preallocated)" instances by
> "if (cluster_type == QCOW2_CLUSTER_ZERO_PLAIN)". Up to you, though.

Makes sense.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-05-06 20:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04  3:07 [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Eric Blake
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 01/10] qcow2: Use consistent switch indentation Eric Blake
2017-05-05 19:42   ` Max Reitz
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 02/10] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
2017-05-05 20:06   ` Max Reitz
2017-05-05 20:13     ` Eric Blake
2017-05-05 20:23       ` Max Reitz
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 03/10] qcow2: Correctly report status of preallocated zero clusters Eric Blake
2017-05-05 20:24   ` Max Reitz
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 04/10] qcow2: Make distinction between zero cluster types obvious Eric Blake
2017-05-05 20:51   ` Max Reitz
2017-05-06 20:30     ` Eric Blake
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 05/10] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
2017-05-05 20:55   ` Max Reitz
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 06/10] iotests: Improve _filter_qemu_img_map Eric Blake
2017-05-05 20:58   ` Max Reitz
2017-05-05 21:06     ` Eric Blake
2017-05-05 21:07       ` Max Reitz
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 07/10] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
2017-05-05 21:24   ` Max Reitz
2017-05-05 22:29   ` Max Reitz
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 08/10] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
2017-05-05 22:06   ` Max Reitz
2017-05-05 22:41     ` Eric Blake
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 09/10] qcow2: Assert that cluster operations are aligned Eric Blake
2017-05-04  3:07 ` [Qemu-devel] [PATCH v12 10/10] qcow2: Discard/zero clusters by byte count Eric Blake
2017-05-05 22:18 ` [Qemu-devel] [PATCH v12 00/10] qcow2 zero-cluster tweaks [was add blkdebug tests] Max Reitz
2017-05-05 22:43   ` Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.