All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v10 00/17] add blkdebug tests
@ 2017-04-27  1:46 Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
                   ` (16 more replies)
  0 siblings, 17 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v10

Prerequisites: Max's block-next tree:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html
Fam's fix for test 109:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03195.html

v9 was:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01723.html

Since then:
- Rebase to master
- Add even more qcow2 patches to make tracking of preallocated
zero clusters correct
- replace v9 1/13 with 4/17 that is not as invasive (no change to
v2 format, and no questionable use of unallocated clusters)
- rewrite v9 2/13 (iotests 179) with 5/17, as something more robust

001/17:[down] 'block: Update comments on BDRV_BLOCK_* meanings'
002/17:[down] 'qcow2: Correctly report status of preallocated zero clusters'
003/17:[down] 'qcow2: Reuse preallocated zero clusters'
004/17:[down] 'qcow2: Optimize zero_single_l2() to minimize L2 churn'
005/17:[0187] [FC] 'iotests: Add test 179 to cover write zeroes with unmap'
006/17:[down] 'qemu-io: Don't open-code QEMU_IS_ALIGNED'
007/17:[0010] [FC] 'qemu-io: Switch 'alloc' command to byte-based length'
008/17:[0022] [FC] 'qemu-io: Switch 'map' output to byte-based reporting'
009/17:[0006] [FC] 'qcow2: Optimize write zero of unaligned tail cluster'
010/17:[----] [-C] 'qcow2: Assert that cluster operations are aligned'
011/17:[0005] [FC] 'qcow2: Discard/zero clusters by byte count'
012/17:[----] [--] 'blkdebug: Sanity check block layer guarantees'
013/17:[----] [--] 'blkdebug: Refactor error injection'
014/17:[----] [--] 'blkdebug: Add pass-through write_zero and discard support'
015/17:[----] [--] 'blkdebug: Simplify override logic'
016/17:[----] [--] 'blkdebug: Add ability to override unmap geometries'
017/17:[----] [-C] 'tests: Add coverage for recent block geometry fixes'

Eric Blake (16):
  block: Update comments on BDRV_BLOCK_* meanings
  qcow2: Correctly report status of preallocated zero clusters
  qcow2: Optimize zero_single_l2() to minimize L2 churn
  iotests: Add test 179 to cover write zeroes with unmap
  qemu-io: Don't open-code QEMU_IS_ALIGNED
  qemu-io: Switch 'alloc' command to byte-based length
  qemu-io: Switch 'map' output to byte-based reporting
  qcow2: Optimize write zero of unaligned tail cluster
  qcow2: Assert that cluster operations are aligned
  qcow2: Discard/zero clusters by byte count
  blkdebug: Sanity check block layer guarantees
  blkdebug: Refactor error injection
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Simplify override logic
  blkdebug: Add ability to override unmap geometries
  tests: Add coverage for recent block geometry fixes

Max Reitz (1):
  qcow2: Reuse preallocated zero clusters

 qapi/block-core.json              |  33 ++++-
 block/qcow2.h                     |  12 +-
 include/block/block.h             |  13 +-
 block/blkdebug.c                  | 264 +++++++++++++++++++++++++++++++-------
 block/qcow2-cluster.c             | 177 +++++++++++++++++--------
 block/qcow2-snapshot.c            |   7 +-
 block/qcow2.c                     |  28 ++--
 qemu-io-cmds.c                    |  45 ++++---
 tests/qemu-iotests/019.out        |   8 +-
 tests/qemu-iotests/102.out        |   4 +-
 tests/qemu-iotests/146.out        |  30 ++---
 tests/qemu-iotests/154            | 112 +++++++++++++++-
 tests/qemu-iotests/154.out        |  83 ++++++++++++
 tests/qemu-iotests/177            | 114 ++++++++++++++++
 tests/qemu-iotests/177.out        |  49 +++++++
 tests/qemu-iotests/179            | 123 ++++++++++++++++++
 tests/qemu-iotests/179.out        |  90 +++++++++++++
 tests/qemu-iotests/common.pattern |   2 +-
 tests/qemu-iotests/group          |   2 +
 19 files changed, 1024 insertions(+), 172 deletions(-)
 create mode 100755 tests/qemu-iotests/177
 create mode 100644 tests/qemu-iotests/177.out
 create mode 100755 tests/qemu-iotests/179
 create mode 100644 tests/qemu-iotests/179.out

-- 
2.9.3

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 17:12   ` Max Reitz
  2017-04-28 20:18   ` Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters Eric Blake
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

We had some conflicting documentation: a nice 8-way table that
described all possible combinations of DATA, ZERO, and
OFFSET_VALID, couple with text that implied that OFFSET_VALID
always meant raw data could be read directly.  As the 8-way
table is the intended semantics, simplify the rest of the text
to get rid of the confusion.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: new patch
---
 include/block/block.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 862eb56..04fcbd0 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -121,21 +121,22 @@ typedef struct HDGeometry {

 /*
  * Allocation status flags
- * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_get_block_status.
+ * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_get_block_status
  * BDRV_BLOCK_ZERO: sectors read as zero
- * BDRV_BLOCK_OFFSET_VALID: sector stored as raw data in a file returned by
- *                          bdrv_get_block_status.
+ * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
  * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
  *                       layer (as opposed to the backing file)
  * BDRV_BLOCK_RAW: used internally to indicate that the request
  *                 was answered by the raw driver and that one
  *                 should look in bs->file directly.
  *
- * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
- * bs->file where sector data can be read from as raw data.
- *
  * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
  *
+ * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset
+ * in bs->file that is allocated for the corresponding raw data;
+ * however, whether that offset actually contains data also depends on
+ * BDRV_BLOCK_DATA, as follows:
+ *
  * DATA ZERO OFFSET_VALID
  *  t    t        t       sectors read as zero, bs->file is zero at offset
  *  t    f        t       sectors read as valid from bs->file at offset
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 17:35   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 03/17] qcow2: Reuse " Eric Blake
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

We were throwing away the preallocation information associated with
zero clusters.  But we should be matching the well-defined semantics
in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
while still reminding the user that reading from that offset is
likely to read garbage.

Making this change lets us see which portions of an image are zero
but preallocated, when using qemu-img map --output=json.  The
--output=human side intentionally ignores all zero clusters, whether
or not they are preallocated.

The fact that there is no change to qemu-iotests './check -qcow2'
merely means that we aren't yet testing this aspect of qemu-img;
a later patch will add a test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: new patch
---
 block/qcow2-cluster.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 100398c..d1063df 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -328,6 +328,10 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
 	return i;
 }

+/*
+ * Checks how many consecutive clusters in a given L2 table have the same
+ * cluster type with no corresponding allocation.
+ */
 static int count_contiguous_clusters_by_type(int nb_clusters,
                                              uint64_t *l2_table,
                                              int wanted_type)
@@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int nb_clusters,
     int i;

     for (i = 0; i < nb_clusters; i++) {
-        int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
+        uint64_t entry = be64_to_cpu(l2_table[i]);
+        int type = qcow2_get_cluster_type(entry);

-        if (type != wanted_type) {
+        if (type != wanted_type || entry & L2E_OFFSET_MASK) {
             break;
         }
     }
@@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
             ret = -EIO;
             goto fail;
         }
-        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
-                                              QCOW2_CLUSTER_ZERO);
-        *cluster_offset = 0;
+        /* Distinguish between pure zero clusters and pre-allocated ones */
+        if (*cluster_offset & L2E_OFFSET_MASK) {
+            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
+                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);
+            *cluster_offset &= L2E_OFFSET_MASK;
+            if (offset_into_cluster(s, *cluster_offset)) {
+                qcow2_signal_corruption(bs, true, -1, -1,
+                                        "Preallocated zero cluster offset %#"
+                                        PRIx64 " unaligned (L2 offset: %#"
+                                        PRIx64 ", L2 index: %#x)",
+                                        *cluster_offset, l2_offset, l2_index);
+                ret = -EIO;
+                goto fail;
+            }
+        } else {
+            c = count_contiguous_clusters_by_type(nb_clusters,
+                                                  &l2_table[l2_index],
+                                                  QCOW2_CLUSTER_ZERO);
+            *cluster_offset = 0;
+        }
         break;
     case QCOW2_CLUSTER_UNALLOCATED:
         /* how many empty clusters ? */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 03/17] qcow2: Reuse preallocated zero clusters
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 17:51   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

From: Max Reitz <mreitz@redhat.com>

Instead of just freeing preallocated zero clusters and completely
allocating them from scratch, reuse them.

We cannot do this in handle_copied(), however, since this is a COW
operation. Therefore, we have to add the new logic to handle_alloc() and
simply return the existing offset if it exists. The only catch is that
we have to convince qcow2_alloc_cluster_link_l2() not to free the old
clusters (because we have reused them).

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: new patch. Max hasn't posted the patch directly on list, but
did mention it here:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03936.html
and that he would post it once he has tests. Well, my later patches
add a test that requires this one :)  The other two patches that
he mentioned there are also good, but not essential for my series
(and I didn't want to write tests to expose the behavior difference,
because it would deprive Max of that fun).
---
 block/qcow2.h         |  3 ++
 block/qcow2-cluster.c | 83 +++++++++++++++++++++++++++++++++++----------------
 2 files changed, 60 insertions(+), 26 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index f8aeb08..8731f24 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -322,6 +322,9 @@ typedef struct QCowL2Meta
     /** Number of newly allocated clusters */
     int nb_clusters;

+    /** Do not free the old clusters */
+    bool keep_old_clusters;
+
     /**
      * Requests that overlap with this allocation and wait to be restarted
      * when the allocating request has completed.
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d1063df..db3d937 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -309,14 +309,20 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
         uint64_t *l2_table, uint64_t stop_flags)
 {
     int i;
+    int first_cluster_type;
     uint64_t mask = stop_flags | L2E_OFFSET_MASK | QCOW_OFLAG_COMPRESSED;
     uint64_t first_entry = be64_to_cpu(l2_table[0]);
     uint64_t offset = first_entry & mask;

-    if (!offset)
+    if (!offset) {
         return 0;
+    }

-    assert(qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL);
+    /* must be allocated */
+    first_cluster_type = qcow2_get_cluster_type(first_entry);
+    assert(first_cluster_type == QCOW2_CLUSTER_NORMAL ||
+           (first_cluster_type == QCOW2_CLUSTER_ZERO &&
+            (first_entry & L2E_OFFSET_MASK) != 0));

     for (i = 0; i < nb_clusters; i++) {
         uint64_t l2_entry = be64_to_cpu(l2_table[i]) & mask;
@@ -857,7 +863,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      * Don't discard clusters that reach a refcount of 0 (e.g. compressed
      * clusters), the next write will reuse them anyway.
      */
-    if (j != 0) {
+    if (!m->keep_old_clusters && j != 0) {
         for (i = 0; i < j; i++) {
             qcow2_free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1,
                                     QCOW2_DISCARD_NEVER);
@@ -1154,8 +1160,9 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t entry;
     uint64_t nb_clusters;
     int ret;
+    bool keep_old_clusters = false;

-    uint64_t alloc_cluster_offset;
+    uint64_t alloc_cluster_offset = 0;

     trace_qcow2_handle_alloc(qemu_coroutine_self(), guest_offset, *host_offset,
                              *bytes);
@@ -1192,31 +1199,53 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
      * wrong with our code. */
     assert(nb_clusters > 0);

+    if (!*host_offset && qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO &&
+        (entry & L2E_OFFSET_MASK) != 0 && (entry & QCOW_OFLAG_COPIED))
+    {
+        /* Try to reuse preallocated zero clusters; contiguous normal clusters
+         * would be fine, too, but count_cow_clusters() above has limited
+         * nb_clusters already to a range of COW clusters */
+        int preallocated_nb_clusters =
+            count_contiguous_clusters(nb_clusters, s->cluster_size, l2_table,
+                                      QCOW_OFLAG_COPIED);
+
+        if (preallocated_nb_clusters) {
+            nb_clusters = preallocated_nb_clusters;
+            alloc_cluster_offset = entry & L2E_OFFSET_MASK;
+
+            /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
+             * should not free them. */
+            keep_old_clusters = true;
+        }
+    }
+
     qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);

-    /* Allocate, if necessary at a given offset in the image file */
-    alloc_cluster_offset = start_of_cluster(s, *host_offset);
-    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
-                                  &nb_clusters);
-    if (ret < 0) {
-        goto fail;
-    }
-
-    /* Can't extend contiguous allocation */
-    if (nb_clusters == 0) {
-        *bytes = 0;
-        return 0;
-    }
-
-    /* !*host_offset would overwrite the image header and is reserved for "no
-     * host offset preferred". If 0 was a valid host offset, it'd trigger the
-     * following overlap check; do that now to avoid having an invalid value in
-     * *host_offset. */
     if (!alloc_cluster_offset) {
-        ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
-                                            nb_clusters * s->cluster_size);
-        assert(ret < 0);
-        goto fail;
+        /* Allocate, if necessary at a given offset in the image file */
+        alloc_cluster_offset = start_of_cluster(s, *host_offset);
+        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
+                                      &nb_clusters);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        /* Can't extend contiguous allocation */
+        if (nb_clusters == 0) {
+            *bytes = 0;
+            return 0;
+        }
+
+        /* !*host_offset would overwrite the image header and is reserved for
+         * "no host offset preferred". If 0 was a valid host offset, it'd
+         * trigger the following overlap check; do that now to avoid having an
+         * invalid value in *host_offset. */
+        if (!alloc_cluster_offset) {
+            ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
+                                                nb_clusters * s->cluster_size);
+            assert(ret < 0);
+            goto fail;
+        }
     }

     /*
@@ -1247,6 +1276,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
         .offset         = start_of_cluster(s, guest_offset),
         .nb_clusters    = nb_clusters,

+        .keep_old_clusters  = keep_old_clusters,
+
         .cow_start = {
             .offset     = 0,
             .nb_bytes   = offset_into_cluster(s, guest_offset),
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (2 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 03/17] qcow2: Reuse " Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 18:00   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Similar to discard_single_l2(), we should try to avoid dirtying
the L2 cache when the cluster we are changing already has the
right characteristics.

Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
is a requirement to unallocate a cluster (this is because the block
layer clears that flag if discard.* flags during open requested that
we never punch holes - see the conversation around commit 170f4b2e,
https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
Therefore, this patch can only reuse a zero cluster as-is if either
unmapping is not requested, or if the zero cluster was not associated
with an allocation.

Technically, there are some cases where an unallocated cluster
already reads as all zeroes (namely, when there is no backing file
[easy: check bs->backing], or when the backing file also reads as
zeroes [harder: we can't check bdrv_get_block_status since we are
already holding the lock]), where the guest would not immediately see
a difference if we left that cluster unallocated.  But if the user
did not request unmapping, leaving an unallocated cluster is wrong;
and even if the user DID request unmapping, keeping a cluster
unallocated risks a subtle semantic change of guest-visible contents
if a backing file is later added, and it is not worth auditing
whether all internal uses such as mirror properly avoid an unmap
request.  Thus, this patch is intentionally limited to just clusters
that are already marked as zero.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: new patch, replacing earlier attempt to use unallocated clusters,
and ditching any optimization of v2 files
---
 block/qcow2-cluster.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index db3d937..d542894 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1614,6 +1614,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
     int l2_index;
     int ret;
     int i;
+    bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);

     ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
     if (ret < 0) {
@@ -1629,9 +1630,17 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,

         old_offset = be64_to_cpu(l2_table[l2_index + i]);

-        /* Update L2 entries */
+        /*
+         * Minimize L2 changes if the cluster already reads back as
+         * zeroes with correct allocation.
+         */
+        if (qcow2_get_cluster_type(old_offset) == QCOW2_CLUSTER_ZERO &&
+            !(unmap && old_offset & L2E_OFFSET_MASK)) {
+            continue;
+        }
+
         qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
-        if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
+        if (old_offset & QCOW_OFLAG_COMPRESSED || unmap) {
             l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (3 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 19:28   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED Eric Blake
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

No tests were covering write zeroes with unmap.  Additionally,
I needed to prove that my previous patches for correct status
reporting and write zeroes optimizations actually had an impact.

The test works for cluster_size between 8k and 2M (for smaller
sizes, it fails because our allocation patterns are not contiguous
with small clusters).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: drop any changes to v2 files, rewrite test to work with updates
earlier in the series, add a blkdebug probe
v9: new patch
---
 tests/qemu-iotests/179     | 123 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/179.out |  90 +++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 214 insertions(+)
 create mode 100755 tests/qemu-iotests/179
 create mode 100644 tests/qemu-iotests/179.out

diff --git a/tests/qemu-iotests/179 b/tests/qemu-iotests/179
new file mode 100755
index 0000000..4e4ffce
--- /dev/null
+++ b/tests/qemu-iotests/179
@@ -0,0 +1,123 @@
+#!/bin/bash
+#
+# Test case for write zeroes with unmap
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=eblake@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# v2 images can't mark clusters as zero
+_unsupported_imgopts compat=0.10
+
+echo
+echo '=== Testing write zeroes with unmap ==='
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 64M
+_make_test_img -b "$TEST_IMG.base"
+
+# Offsets chosen at or near 2M boundaries so test works at any cluster size
+
+# Aligned writes to unallocated cluster should not allocate mapping, but must
+# mark cluster as zero, whether or not unmap was requested
+$QEMU_IO -c "write -z -u 2M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 6M 2M" "$TEST_IMG.base" | _filter_qemu_io
+
+# Unaligned writes need not allocate mapping if the cluster already reads
+# as zero, but must mark cluster as zero, whether or not unmap was requested
+$QEMU_IO -c "write -z -u 10485761 2097150" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 14680065 2097150" "$TEST_IMG.base" | _filter_qemu_io
+
+# Requesting unmap of normal data must deallocate; omitting unmap should
+# preserve the mapping
+$QEMU_IO -c "write 18M 14M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 20M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 24M 6M" "$TEST_IMG.base" | _filter_qemu_io
+
+# Likewise when writing on already-mapped zero data
+$QEMU_IO -c "write -z -u 26M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 28M 2M" "$TEST_IMG.base" | _filter_qemu_io
+
+# Writing on unmapped zeroes does not allocate
+$QEMU_IO -c "write -z 32M 8M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 34M 2M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 36M 2M" "$TEST_IMG.base" | _filter_qemu_io
+
+# Writing zero overrides a backing file, regardless of backing cluster type
+$QEMU_IO -c "write -z 40M 8M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write 48M 8M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 42M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z 44M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 50M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z 52M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 58M 2M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z 60M 2M" "$TEST_IMG" | _filter_qemu_io
+
+# Final check that mappings are correct and images are still sane
+$QEMU_IO -c "map" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map |
+    sed 's/"offset": [0-9]*/"offset": OFFSET/g'
+TEST_IMG="$TEST_IMG.base" _check_test_img
+_check_test_img
+
+echo
+echo '=== Testing cache optimization ==='
+echo
+
+BLKDBG_TEST_IMG="blkdebug:$TEST_DIR/blkdebug.conf:$TEST_IMG.base"
+
+cat > "$TEST_DIR/blkdebug.conf" <<EOF
+[inject-error]
+event = "l2_update"
+errno = "5"
+immediately = "on"
+once = "off"
+EOF
+
+# None of the following writes should trigger an L2 update, because the
+# cluster already reads as zero, and we don't have to change allocation
+$QEMU_IO -c "w -z -u 20M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "w -z 20M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "w -z 28M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+
+# Proof that our blkdebug hook works
+$QEMU_IO -c "w -z -u 0M 2M" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+
+# success, all done
+echo '*** done'
+status=0
diff --git a/tests/qemu-iotests/179.out b/tests/qemu-iotests/179.out
new file mode 100644
index 0000000..2c3071b
--- /dev/null
+++ b/tests/qemu-iotests/179.out
@@ -0,0 +1,90 @@
+QA output created by 179
+
+=== Testing write zeroes with unmap ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 2097152/2097152 bytes at offset 2097152
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 6291456
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097150/2097150 bytes at offset 10485761
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097150/2097150 bytes at offset 14680065
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 14680064/14680064 bytes at offset 18874368
+14 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 20971520
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 6291456/6291456 bytes at offset 25165824
+6 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 27262976
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 29360128
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 8388608/8388608 bytes at offset 33554432
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 35651584
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 37748736
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 8388608/8388608 bytes at offset 41943040
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 8388608/8388608 bytes at offset 50331648
+8 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 44040192
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 46137344
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 52428800
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 54525952
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 60817408
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 62914560
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[                       0]     4096/  131072 sectors not allocated at offset 0 bytes (0)
+[                 2097152]     4096/  126976 sectors     allocated at offset 2 MiB (1)
+[                 4194304]     4096/  122880 sectors not allocated at offset 4 MiB (0)
+[                 6291456]     4096/  118784 sectors     allocated at offset 6 MiB (1)
+[                 8388608]     4096/  114688 sectors not allocated at offset 8 MiB (0)
+[                10485760]     4096/  110592 sectors     allocated at offset 10 MiB (1)
+[                12582912]     4096/  106496 sectors not allocated at offset 12 MiB (0)
+[                14680064]     4096/  102400 sectors     allocated at offset 14 MiB (1)
+[                16777216]     4096/   98304 sectors not allocated at offset 16 MiB (0)
+[                18874368]    77824/   94208 sectors     allocated at offset 18 MiB (1)
+[                58720256]    16384/   16384 sectors not allocated at offset 56 MiB (0)
+[{ "start": 0, "length": 18874368, "depth": 1, "zero": true, "data": false},
+{ "start": 18874368, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 20971520, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 23068672, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 25165824, "length": 2097152, "depth": 1, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 27262976, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 29360128, "length": 2097152, "depth": 1, "zero": true, "data": false, "offset": OFFSET},
+{ "start": 31457280, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 33554432, "length": 10485760, "depth": 1, "zero": true, "data": false},
+{ "start": 44040192, "length": 4194304, "depth": 0, "zero": true, "data": false},
+{ "start": 48234496, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 50331648, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 52428800, "length": 4194304, "depth": 0, "zero": true, "data": false},
+{ "start": 56623104, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 58720256, "length": 2097152, "depth": 1, "zero": true, "data": false},
+{ "start": 60817408, "length": 4194304, "depth": 0, "zero": true, "data": false},
+{ "start": 65011712, "length": 2097152, "depth": 1, "zero": true, "data": false}]
+No errors were found on the image.
+No errors were found on the image.
+
+=== Testing cache optimization ===
+
+wrote 2097152/2097152 bytes at offset 20971520
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 20971520
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2097152/2097152 bytes at offset 29360128
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Failed to flush the L2 table cache: Input/output error
+Failed to flush the refcount block cache: Input/output error
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 893962d..11ba1d9 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -169,4 +169,5 @@
 174 auto
 175 auto quick
 176 rw auto backing
+179 rw auto quick
 181 rw auto migration
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (4 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 19:30   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Manual comparison against 0x1ff is not as clean as using our
alignment macros from osdep.h.

Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: new patch
---
 qemu-io-cmds.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 21af9e6..fabc394 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -740,12 +740,12 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
     }

     if (bflag) {
-        if (offset & 0x1ff) {
+        if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
             printf("offset %" PRId64 " is not sector aligned\n",
                    offset);
             return 0;
         }
-        if (count & 0x1ff) {
+        if (!QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE)) {
             printf("count %"PRId64" is not sector aligned\n",
                    count);
             return 0;
@@ -1050,13 +1050,13 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
     }

     if (bflag || cflag) {
-        if (offset & 0x1ff) {
+        if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
             printf("offset %" PRId64 " is not sector aligned\n",
                    offset);
             return 0;
         }

-        if (count & 0x1ff) {
+        if (!QEMU_IS_ALIGNED(count, BDRV_SECTOR_SIZE)) {
             printf("count %"PRId64" is not sector aligned\n",
                    count);
             return 0;
@@ -1769,7 +1769,7 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
     if (offset < 0) {
         print_cvtnum_err(offset, argv[1]);
         return 0;
-    } else if (offset & 0x1ff) {
+    } else if (!QEMU_IS_ALIGNED(offset, BDRV_SECTOR_SIZE)) {
         printf("offset %" PRId64 " is not sector aligned\n",
                offset);
         return 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (5 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 19:46   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

For the 'alloc' command, accepting an offset in bytes but a length
in sectors, and reporting output in sectors, is confusing.  Do
everything in bytes, and adjust the expected output accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

---
v10: rebase to code cleanup
v9: new patch
---
 qemu-io-cmds.c                    | 30 ++++++++++++++++++------------
 tests/qemu-iotests/019.out        |  8 ++++----
 tests/qemu-iotests/common.pattern |  2 +-
 3 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index fabc394..34f6707 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1760,7 +1760,7 @@ out:
 static int alloc_f(BlockBackend *blk, int argc, char **argv)
 {
     BlockDriverState *bs = blk_bs(blk);
-    int64_t offset, sector_num, nb_sectors, remaining;
+    int64_t offset, sector_num, nb_sectors, remaining, bytes;
     char s1[64];
     int num, ret;
     int64_t sum_alloc;
@@ -1776,18 +1776,24 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
     }

     if (argc == 3) {
-        nb_sectors = cvtnum(argv[2]);
-        if (nb_sectors < 0) {
-            print_cvtnum_err(nb_sectors, argv[2]);
+        bytes = cvtnum(argv[2]);
+        if (bytes < 0) {
+            print_cvtnum_err(bytes, argv[2]);
             return 0;
-        } else if (nb_sectors > INT_MAX) {
-            printf("length argument cannot exceed %d, given %s\n",
-                   INT_MAX, argv[2]);
+        } else if (bytes > INT_MAX * BDRV_SECTOR_SIZE) {
+            printf("length argument cannot exceed %llu, given %s\n",
+                   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
             return 0;
         }
     } else {
-        nb_sectors = 1;
+        bytes = BDRV_SECTOR_SIZE;
     }
+    if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
+        printf("bytes %" PRId64 " is not sector aligned\n",
+               bytes);
+        return 0;
+    }
+    nb_sectors = bytes >> BDRV_SECTOR_BITS;

     remaining = nb_sectors;
     sum_alloc = 0;
@@ -1811,8 +1817,8 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)

     cvtstr(offset, s1, sizeof(s1));

-    printf("%"PRId64"/%"PRId64" sectors allocated at offset %s\n",
-           sum_alloc, nb_sectors, s1);
+    printf("%"PRId64"/%"PRId64" bytes allocated at offset %s\n",
+           sum_alloc << BDRV_SECTOR_BITS, nb_sectors << BDRV_SECTOR_BITS, s1);
     return 0;
 }

@@ -1822,8 +1828,8 @@ static const cmdinfo_t alloc_cmd = {
     .argmin     = 1,
     .argmax     = 2,
     .cfunc      = alloc_f,
-    .args       = "off [sectors]",
-    .oneline    = "checks if a sector is present in the file",
+    .args       = "offset [bytes]",
+    .oneline    = "checks if offset is allocated in the file",
 };


diff --git a/tests/qemu-iotests/019.out b/tests/qemu-iotests/019.out
index 0124264..17a7c03 100644
--- a/tests/qemu-iotests/019.out
+++ b/tests/qemu-iotests/019.out
@@ -542,8 +542,8 @@ Testing conversion with -B TEST_DIR/t.IMGFMT.base

 Checking if backing clusters are allocated when they shouldn't

-0/128 sectors allocated at offset 1 MiB
-0/128 sectors allocated at offset 4.001 GiB
+0/65536 bytes allocated at offset 1 MiB
+0/65536 bytes allocated at offset 4.001 GiB
 Reading

 === IO: pattern 42
@@ -1086,8 +1086,8 @@ Testing conversion with -o backing_file=TEST_DIR/t.IMGFMT.base

 Checking if backing clusters are allocated when they shouldn't

-0/128 sectors allocated at offset 1 MiB
-0/128 sectors allocated at offset 4.001 GiB
+0/65536 bytes allocated at offset 1 MiB
+0/65536 bytes allocated at offset 4.001 GiB
 Reading

 === IO: pattern 42
diff --git a/tests/qemu-iotests/common.pattern b/tests/qemu-iotests/common.pattern
index ddfbca1..34f4a8d 100644
--- a/tests/qemu-iotests/common.pattern
+++ b/tests/qemu-iotests/common.pattern
@@ -18,7 +18,7 @@

 function do_is_allocated() {
     local start=$1
-    local size=$(( $2 / 512))
+    local size=$2
     local step=$3
     local count=$4

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (6 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 19:53   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Mixing byte offset and sector allocation counts is a bit
confusing.  Also, reporting n/m sectors, where m decreases
according to the remaining size of the file, isn't really
adding any useful information.  Update the output to use
byte counts, and adjust the affected tests (./check -qcow2 102,
./check -vpc 146).

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: rebase to updated test 179
v9: new patch
---
 qemu-io-cmds.c             |  5 ++---
 tests/qemu-iotests/102.out |  4 ++--
 tests/qemu-iotests/146.out | 30 +++++++++++++++---------------
 tests/qemu-iotests/179.out | 22 +++++++++++-----------
 4 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 34f6707..40a3f2d 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1895,9 +1895,8 @@ static int map_f(BlockBackend *blk, int argc, char **argv)

         retstr = ret ? "    allocated" : "not allocated";
         cvtstr(offset << 9ULL, s1, sizeof(s1));
-        printf("[% 24" PRId64 "] % 8" PRId64 "/% 8" PRId64 " sectors %s "
-               "at offset %s (%d)\n",
-               offset << 9ULL, num, nb_sectors, retstr, s1, ret);
+        printf("[% 24" PRId64 "] % 16" PRId64 " bytes %s at offset %s (%d)\n",
+               offset << 9ULL, num << 9ULL, retstr, s1, ret);

         offset += num;
         nb_sectors -= num;
diff --git a/tests/qemu-iotests/102.out b/tests/qemu-iotests/102.out
index eecde16..0e0ae4c 100644
--- a/tests/qemu-iotests/102.out
+++ b/tests/qemu-iotests/102.out
@@ -6,7 +6,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Image resized.
-[                       0]      128/     128 sectors     allocated at offset 0 bytes (1)
+[                       0]            65536 bytes     allocated at offset 0 bytes (1)
 Offset          Length          Mapped to       File

 === Testing map on an image file truncated outside of qemu ===
@@ -17,5 +17,5 @@ wrote 65536/65536 bytes at offset 0
 Image resized.
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qemu-io drv0 map
-[                       0]      128/     128 sectors     allocated at offset 0 bytes (1)
+[                       0]            65536 bytes     allocated at offset 0 bytes (1)
 *** done
diff --git a/tests/qemu-iotests/146.out b/tests/qemu-iotests/146.out
index 4f334d8..d4e2f62 100644
--- a/tests/qemu-iotests/146.out
+++ b/tests/qemu-iotests/146.out
@@ -2,39 +2,39 @@ QA output created by 146

 === Testing VPC Autodetect ===

-[                       0]  266334240/ 266334240 sectors not allocated at offset 0 bytes (0)
+[                       0]     136363130880 bytes not allocated at offset 0 bytes (0)

 === Testing VPC with current_size force ===

-[                       0]  266338304/ 266338304 sectors not allocated at offset 0 bytes (0)
+[                       0]     136365211648 bytes not allocated at offset 0 bytes (0)

 === Testing VPC with chs force ===

-[                       0]  266334240/ 266334240 sectors not allocated at offset 0 bytes (0)
+[                       0]     136363130880 bytes not allocated at offset 0 bytes (0)

 === Testing Hyper-V Autodetect ===

-[                       0]  266338304/ 266338304 sectors not allocated at offset 0 bytes (0)
+[                       0]     136365211648 bytes not allocated at offset 0 bytes (0)

 === Testing Hyper-V with current_size force ===

-[                       0]  266338304/ 266338304 sectors not allocated at offset 0 bytes (0)
+[                       0]     136365211648 bytes not allocated at offset 0 bytes (0)

 === Testing Hyper-V with chs force ===

-[                       0]  266334240/ 266334240 sectors not allocated at offset 0 bytes (0)
+[                       0]     136363130880 bytes not allocated at offset 0 bytes (0)

 === Testing d2v Autodetect ===

-[                       0]   514560/  514560 sectors     allocated at offset 0 bytes (1)
+[                       0]        263454720 bytes     allocated at offset 0 bytes (1)

 === Testing d2v with current_size force ===

-[                       0]   514560/  514560 sectors     allocated at offset 0 bytes (1)
+[                       0]        263454720 bytes     allocated at offset 0 bytes (1)

 === Testing d2v with chs force ===

-[                       0]   514560/  514560 sectors     allocated at offset 0 bytes (1)
+[                       0]        263454720 bytes     allocated at offset 0 bytes (1)

 === Testing Image create, default ===

@@ -42,15 +42,15 @@ Formatting 'TEST_DIR/IMGFMT-create-test.IMGFMT', fmt=IMGFMT size=4294967296

 === Read created image, default opts ====

-[                       0]  8389584/ 8389584 sectors not allocated at offset 0 bytes (0)
+[                       0]       4295467008 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=chs ====

-[                       0]  8389584/ 8389584 sectors not allocated at offset 0 bytes (0)
+[                       0]       4295467008 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=current_size ====

-[                       0]  8389584/ 8389584 sectors not allocated at offset 0 bytes (0)
+[                       0]       4295467008 bytes not allocated at offset 0 bytes (0)

 === Testing Image create, force_size ===

@@ -58,13 +58,13 @@ Formatting 'TEST_DIR/IMGFMT-create-test.IMGFMT', fmt=IMGFMT size=4294967296 forc

 === Read created image, default opts ====

-[                       0]  8388608/ 8388608 sectors not allocated at offset 0 bytes (0)
+[                       0]       4294967296 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=chs ====

-[                       0]  8388608/ 8388608 sectors not allocated at offset 0 bytes (0)
+[                       0]       4294967296 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=current_size ====

-[                       0]  8388608/ 8388608 sectors not allocated at offset 0 bytes (0)
+[                       0]       4294967296 bytes not allocated at offset 0 bytes (0)
 *** done
diff --git a/tests/qemu-iotests/179.out b/tests/qemu-iotests/179.out
index 2c3071b..33e9e3d 100644
--- a/tests/qemu-iotests/179.out
+++ b/tests/qemu-iotests/179.out
@@ -44,17 +44,17 @@ wrote 2097152/2097152 bytes at offset 60817408
 2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 2097152/2097152 bytes at offset 62914560
 2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-[                       0]     4096/  131072 sectors not allocated at offset 0 bytes (0)
-[                 2097152]     4096/  126976 sectors     allocated at offset 2 MiB (1)
-[                 4194304]     4096/  122880 sectors not allocated at offset 4 MiB (0)
-[                 6291456]     4096/  118784 sectors     allocated at offset 6 MiB (1)
-[                 8388608]     4096/  114688 sectors not allocated at offset 8 MiB (0)
-[                10485760]     4096/  110592 sectors     allocated at offset 10 MiB (1)
-[                12582912]     4096/  106496 sectors not allocated at offset 12 MiB (0)
-[                14680064]     4096/  102400 sectors     allocated at offset 14 MiB (1)
-[                16777216]     4096/   98304 sectors not allocated at offset 16 MiB (0)
-[                18874368]    77824/   94208 sectors     allocated at offset 18 MiB (1)
-[                58720256]    16384/   16384 sectors not allocated at offset 56 MiB (0)
+[                       0]          2097152 bytes not allocated at offset 0 bytes (0)
+[                 2097152]          2097152 bytes     allocated at offset 2 MiB (1)
+[                 4194304]          2097152 bytes not allocated at offset 4 MiB (0)
+[                 6291456]          2097152 bytes     allocated at offset 6 MiB (1)
+[                 8388608]          2097152 bytes not allocated at offset 8 MiB (0)
+[                10485760]          2097152 bytes     allocated at offset 10 MiB (1)
+[                12582912]          2097152 bytes not allocated at offset 12 MiB (0)
+[                14680064]          2097152 bytes     allocated at offset 14 MiB (1)
+[                16777216]          2097152 bytes not allocated at offset 16 MiB (0)
+[                18874368]         39845888 bytes     allocated at offset 18 MiB (1)
+[                58720256]          8388608 bytes not allocated at offset 56 MiB (0)
 [{ "start": 0, "length": 18874368, "depth": 1, "zero": true, "data": false},
 { "start": 18874368, "length": 2097152, "depth": 1, "zero": false, "data": true, "offset": OFFSET},
 { "start": 20971520, "length": 2097152, "depth": 1, "zero": true, "data": false},
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (7 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-28 20:48   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 10/17] qcow2: Assert that cluster operations are aligned Eric Blake
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

We've already improved discards to operate efficiently on the tail
of an unaligned qcow2 image; it's time to make a similar improvement
to write zeroes.  The special case is only valid at the tail
cluster of a file, where we must recognize that any sectors beyond
the image end would implicitly read as zero, and therefore should
not penalize our logic for widening a partial cluster into writing
the whole cluster as zero.

Update test 154 to cover the new scenarios, using two images of
intentionally differing length.

While at it, fix the test to gracefully skip when run as
./check -qcow2 -o compat=0.10 154
since the older format lacks zero clusters already required earlier
in the test.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: rebase to better reporting of preallocated zero clusters
v9: new patch
---
 block/qcow2.c              |   7 +++
 tests/qemu-iotests/154     | 112 ++++++++++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/154.out |  83 +++++++++++++++++++++++++++++++++
 3 files changed, 200 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5c1573c..8038793 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
     BlockDriverState *file;
     int64_t res;

+    if (start + count > bs->total_sectors) {
+        count = bs->total_sectors - start;
+    }
+
     if (!count) {
         return true;
     }
@@ -2467,6 +2471,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     uint32_t tail = (offset + count) % s->cluster_size;

     trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count);
+    if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
+        tail = 0;
+    }

     if (head || tail) {
         int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
index 7ca7219..005f09f 100755
--- a/tests/qemu-iotests/154
+++ b/tests/qemu-iotests/154
@@ -2,7 +2,7 @@
 #
 # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 034)
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Copyright (C) 2016-2017 Red Hat, Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -42,7 +42,10 @@ _supported_proto file
 _supported_os Linux

 CLUSTER_SIZE=4k
-size=128M
+size=$((128 * 1024 * 1024))
+
+# This test requires zero clusters, added in v3 images
+_unsupported_imgopts compat=0.10

 echo
 echo == backing file contains zeros ==
@@ -299,6 +302,111 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | _filter_qemu_io

 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map

+echo
+echo == unaligned image tail cluster, no allocation needed ==
+
+CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# With no backing file, write to all or part of unallocated partial cluster
+
+# Backing file: 128m: ... | 00 --
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
+
+# Backing file: 128m: ... | -- 00
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
+
+# Backing file: 128m: ... | 00 00
+$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io
+
+# No offset should be allocated, although the cluster itself is considered
+# allocated due to the zero flag
+$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+# With backing file that reads as zero, write to all or part of entire cluster
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | 00 00 00 00
+$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | 00 00 -- --
+$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | -- -- 00 00
+$QEMU_IO -c "write -z $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | -- 00 00 --
+$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io
+
+# No offset should be allocated, although the cluster itself is considered
+# allocated due to the zero flag
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Preallocated cluster is not freed when unmap is not requested
+
+# Backing file: 128m: ... | XX -- => ... | XX 00
+$QEMU_IO -c "write $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
+
+# Backing file: 128m: ... | XX 00 => ... | 00 00
+$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map |
+    sed 's/"offset": [0-9]*/"offset": OFFSET/g'
+
+echo
+echo == unaligned image tail cluster, allocation required ==
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# Write beyond backing file must COW
+# Backing file: 128m: ... | XX --
+# Active layer: 128m: ... | -- -- 00 --
+
+$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# Writes at boundaries of (partial) cluster must not lose mid-cluster data
+# Backing file: 128m: ... | -- XX
+# Active layer: 128m: ... | 00 -- -- 00
+
+$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# Partial write must not lose data
+# Backing file: 128m: ... | -- --
+# Active layer: 128m: ... | -- -- XX 00
+
+$QEMU_IO -c "write -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index da9eabd..f8a09af 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -282,4 +282,87 @@ read 1024/1024 bytes at offset 76800
 { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
 { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}]
+
+== unaligned image tail cluster, no allocation needed ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+1024/1024 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134218752, "depth": 0, "zero": true, "data": false}]
+wrote 2048/2048 bytes at offset 134217728
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134218240
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 0, "zero": true, "data": false},
+{ "start": 134217728, "length": 1024, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
+
+== unaligned image tail cluster, allocation required ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134218752
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1536/1536 bytes at offset 134218240
+1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134218752
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218752
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
 *** done
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 10/17] qcow2: Assert that cluster operations are aligned
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (8 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-05-03 17:56   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count Eric Blake
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
is only passed cluster-aligned start values; but we can further
tighten the assertion that the only unaligned end value is at EOF.

Recent commits have taken advantage of an unaligned tail cluster,
for both discard and write zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: rebase to context
v9: rebase to master, by asserting that only tail cluster is unaligned
v7, v8: only earlier half of series submitted for 2.9
v6: avoid assertion on non-cluster-aligned image, use s->cluster_sectors
to avoid a shift, drop R-b
v5: no change
v4: new patch
---
 block/qcow2-cluster.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d542894..4f641a9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1572,11 +1572,10 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);

-    /* The caller must cluster-align start; round end down except at EOF */
+    /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-    if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
-        end_offset = start_of_cluster(s, end_offset);
-    }
+    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);

     nb_clusters = size_to_clusters(s, end_offset - offset);

@@ -1657,9 +1656,17 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
                         int flags)
 {
     BDRVQcow2State *s = bs->opaque;
+    uint64_t end_offset;
     uint64_t nb_clusters;
     int ret;

+    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
+
+    /* Caller must pass aligned values, except at image end */
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+
     /* The zero flag is only supported by version 3 and newer */
     if (s->qcow_version < 3) {
         return -ENOTSUP;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (9 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 10/17] qcow2: Assert that cluster operations are aligned Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-05-03 18:28   ` Max Reitz
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 12/17] blkdebug: Sanity check block layer guarantees Eric Blake
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v10: squash in fixup accounting for unaligned file end
v9: rebase to earlier changes, drop R-b
v7, v8: only earlier half of series submitted for 2.9
v6: rebase due to context
v5: s/count/byte/ to make the units obvious, and rework the math
to ensure no 32-bit integer overflow on large clusters
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
 block/qcow2.h          |  9 +++++----
 block/qcow2-cluster.c  | 40 +++++++++++++++++++++-------------------
 block/qcow2-snapshot.c |  7 +++----
 block/qcow2.c          | 21 +++++++++------------
 4 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 8731f24..03f23bf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -547,10 +547,11 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          int compressed_size);

 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags);
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard);
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags);

 int qcow2_expand_zero_clusters(BlockDriverState *bs,
                                BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4f641a9..a47aadc 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1562,16 +1562,16 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + bytes;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
@@ -1583,13 +1583,15 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     /* Each L2 table is handled by its own loop iteration */
     while (nb_clusters > 0) {
-        ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
-        if (ret < 0) {
+        cleared = discard_single_l2(bs, offset, nb_clusters, type,
+                                    full_discard);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
@@ -1652,16 +1654,15 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags)
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + bytes;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
@@ -1673,18 +1674,19 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
     }

     /* Each L2 table is handled by its own loop iteration */
-    nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
+    nb_clusters = size_to_clusters(s, bytes);

     s->cache_discards = true;

     while (nb_clusters > 0) {
-        ret = zero_single_l2(bs, offset, nb_clusters, flags);
-        if (ret < 0) {
+        cleared = zero_single_l2(bs, offset, nb_clusters, flags);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0324243..44243e0 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -440,10 +440,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)

     /* The VM state isn't needed any more in the active L1 table; in fact, it
      * hurts by causing expensive COW for the next snapshot. */
-    qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
-                           align_offset(sn->vm_state_size, s->cluster_size)
-                                >> BDRV_SECTOR_BITS,
-                           QCOW2_DISCARD_NEVER, false);
+    qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
+                          align_offset(sn->vm_state_size, s->cluster_size),
+                          QCOW2_DISCARD_NEVER, false);

 #ifdef DEBUG_ALLOC
     {
diff --git a/block/qcow2.c b/block/qcow2.c
index 8038793..4d34610 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2508,7 +2508,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);

     /* Whatever is left can use real zero clusters */
-    ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
+    ret = qcow2_cluster_zeroize(bs, offset, count, flags);
     qemu_co_mutex_unlock(&s->lock);

     return ret;
@@ -2531,8 +2531,8 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
     }

     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
-                                 QCOW2_DISCARD_REQUEST, false);
+    ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST,
+                                false);
     qemu_co_mutex_unlock(&s->lock);
     return ret;
 }
@@ -2839,9 +2839,8 @@ fail:
 static int qcow2_make_empty(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t start_sector;
-    int sector_step = (QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size) /
-                       BDRV_SECTOR_SIZE);
+    uint64_t offset, end_offset;
+    int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size);
     int l1_clusters, ret = 0;

     l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2858,18 +2857,16 @@ static int qcow2_make_empty(BlockDriverState *bs)

     /* This fallback code simply discards every active cluster; this is slow,
      * but works in all cases */
-    for (start_sector = 0; start_sector < bs->total_sectors;
-         start_sector += sector_step)
+    end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
+    for (offset = 0; offset < end_offset; offset += step)
     {
         /* As this function is generally used after committing an external
          * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
          * default action for this kind of discard is to pass the discard,
          * which will ideally result in an actually smaller image file, as
          * is probably desired. */
-        ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
-                                     MIN(sector_step,
-                                         bs->total_sectors - start_sector),
-                                     QCOW2_DISCARD_SNAPSHOT, true);
+        ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
+                                    QCOW2_DISCARD_SNAPSHOT, true);
         if (ret < 0) {
             break;
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 12/17] blkdebug: Sanity check block layer guarantees
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (10 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 13/17] blkdebug: Refactor error injection Eric Blake
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug.  For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v10: no change
v5-v9: no change
v4: no change
v3: rebase to byte-based interfaces
v2: new patch
---
 block/blkdebug.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index c795ae9..14d3fc5 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -431,6 +431,13 @@ blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;

+    /* Sanity check block layer guarantees */
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+    if (bs->bl.max_transfer) {
+        assert(bytes <= bs->bl.max_transfer);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         uint64_t inject_offset = rule->options.inject.offset;

@@ -455,6 +462,13 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;

+    /* Sanity check block layer guarantees */
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+    if (bs->bl.max_transfer) {
+        assert(bytes <= bs->bl.max_transfer);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         uint64_t inject_offset = rule->options.inject.offset;

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 13/17] blkdebug: Refactor error injection
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (11 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 12/17] blkdebug: Sanity check block layer guarantees Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 14/17] blkdebug: Add pass-through write_zero and discard support Eric Blake
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Rather than repeat the logic at each caller of checking if a Rule
exists that warrants an error injection, fold that logic into
inject_error(); and rename it to rule_check() for legibility.
This will help the next patch, which adds two more callers that
need to check rules for the potential of injecting errors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v10: no change
v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: new patch
---
 block/blkdebug.c | 74 +++++++++++++++++++++++++-------------------------------
 1 file changed, 33 insertions(+), 41 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 14d3fc5..db58db0 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -405,11 +405,30 @@ out:
     return ret;
 }

-static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
+static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
 {
     BDRVBlkdebugState *s = bs->opaque;
-    int error = rule->options.inject.error;
-    bool immediately = rule->options.inject.immediately;
+    BlkdebugRule *rule = NULL;
+    int error;
+    bool immediately;
+
+    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
+        uint64_t inject_offset = rule->options.inject.offset;
+
+        if (inject_offset == -1 ||
+            (bytes && inject_offset >= offset &&
+             inject_offset < offset + bytes))
+        {
+            break;
+        }
+    }
+
+    if (!rule || !rule->options.inject.error) {
+        return 0;
+    }
+
+    immediately = rule->options.inject.immediately;
+    error = rule->options.inject.error;

     if (rule->options.inject.once) {
         QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
@@ -428,8 +447,7 @@ static int coroutine_fn
 blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
                    QEMUIOVector *qiov, int flags)
 {
-    BDRVBlkdebugState *s = bs->opaque;
-    BlkdebugRule *rule = NULL;
+    int err;

     /* Sanity check block layer guarantees */
     assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
@@ -438,18 +456,9 @@ blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         assert(bytes <= bs->bl.max_transfer);
     }

-    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
-        uint64_t inject_offset = rule->options.inject.offset;
-
-        if (inject_offset == -1 ||
-            (inject_offset >= offset && inject_offset < offset + bytes))
-        {
-            break;
-        }
-    }
-
-    if (rule && rule->options.inject.error) {
-        return inject_error(bs, rule);
+    err = rule_check(bs, offset, bytes);
+    if (err) {
+        return err;
     }

     return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
@@ -459,8 +468,7 @@ static int coroutine_fn
 blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
                     QEMUIOVector *qiov, int flags)
 {
-    BDRVBlkdebugState *s = bs->opaque;
-    BlkdebugRule *rule = NULL;
+    int err;

     /* Sanity check block layer guarantees */
     assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
@@ -469,18 +477,9 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         assert(bytes <= bs->bl.max_transfer);
     }

-    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
-        uint64_t inject_offset = rule->options.inject.offset;
-
-        if (inject_offset == -1 ||
-            (inject_offset >= offset && inject_offset < offset + bytes))
-        {
-            break;
-        }
-    }
-
-    if (rule && rule->options.inject.error) {
-        return inject_error(bs, rule);
+    err = rule_check(bs, offset, bytes);
+    if (err) {
+        return err;
     }

     return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@@ -488,17 +487,10 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,

 static int blkdebug_co_flush(BlockDriverState *bs)
 {
-    BDRVBlkdebugState *s = bs->opaque;
-    BlkdebugRule *rule = NULL;
+    int err = rule_check(bs, 0, 0);

-    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
-        if (rule->options.inject.offset == -1) {
-            break;
-        }
-    }
-
-    if (rule && rule->options.inject.error) {
-        return inject_error(bs, rule);
+    if (err) {
+        return err;
     }

     return bdrv_co_flush(bs->file->bs);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 14/17] blkdebug: Add pass-through write_zero and discard support
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (12 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 13/17] blkdebug: Refactor error injection Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 15/17] blkdebug: Simplify override logic Eric Blake
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v10: no change
v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: tighten check of unaligned requests, rebase on rule check
refactoring, drop R-b
v5: include 2017 copyright
v4: correct error injection to respect byte range, tweak formatting
v3: rebase to byte-based read/write, improve docs on why no
partial write zero passthrough
v2: new patch
---
 block/blkdebug.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index db58db0..62d113c 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1,6 +1,7 @@
 /*
  * Block protocol for I/O error injection
  *
+ * Copyright (C) 2016-2017 Red Hat, Inc.
  * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -382,6 +383,11 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         goto out;
     }

+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags;
+    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags;
+
     /* Set request alignment */
     align = qemu_opt_get_size(opts, "align", 0);
     if (align < INT_MAX && is_power_of_2(align)) {
@@ -496,6 +502,72 @@ static int blkdebug_co_flush(BlockDriverState *bs)
     return bdrv_co_flush(bs->file->bs);
 }

+static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
+                                                  int64_t offset, int count,
+                                                  BdrvRequestFlags flags)
+{
+    uint32_t align = MAX(bs->bl.request_alignment,
+                         bs->bl.pwrite_zeroes_alignment);
+    int err;
+
+    /* Only pass through requests that are larger than requested
+     * preferred alignment (so that we test the fallback to writes on
+     * unaligned portions), and check that the block layer never hands
+     * us anything unaligned that crosses an alignment boundary.  */
+    if (count < align) {
+        assert(QEMU_IS_ALIGNED(offset, align) ||
+               QEMU_IS_ALIGNED(offset + count, align) ||
+               DIV_ROUND_UP(offset, align) ==
+               DIV_ROUND_UP(offset + count, align));
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, align));
+    assert(QEMU_IS_ALIGNED(count, align));
+    if (bs->bl.max_pwrite_zeroes) {
+        assert(count <= bs->bl.max_pwrite_zeroes);
+    }
+
+    err = rule_check(bs, offset, count);
+    if (err) {
+        return err;
+    }
+
+    return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
+}
+
+static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int count)
+{
+    uint32_t align = bs->bl.pdiscard_alignment;
+    int err;
+
+    /* Only pass through requests that are larger than requested
+     * minimum alignment, and ensure that unaligned requests do not
+     * cross optimum discard boundaries. */
+    if (count < bs->bl.request_alignment) {
+        assert(QEMU_IS_ALIGNED(offset, align) ||
+               QEMU_IS_ALIGNED(offset + count, align) ||
+               DIV_ROUND_UP(offset, align) ==
+               DIV_ROUND_UP(offset + count, align));
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
+    if (align && count >= align) {
+        assert(QEMU_IS_ALIGNED(offset, align));
+        assert(QEMU_IS_ALIGNED(count, align));
+    }
+    if (bs->bl.max_pdiscard) {
+        assert(count <= bs->bl.max_pdiscard);
+    }
+
+    err = rule_check(bs, offset, count);
+    if (err) {
+        return err;
+    }
+
+    return bdrv_co_pdiscard(bs->file->bs, offset, count);
+}

 static void blkdebug_close(BlockDriverState *bs)
 {
@@ -750,6 +822,8 @@ static BlockDriver bdrv_blkdebug = {
     .bdrv_co_preadv         = blkdebug_co_preadv,
     .bdrv_co_pwritev        = blkdebug_co_pwritev,
     .bdrv_co_flush_to_disk  = blkdebug_co_flush,
+    .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
+    .bdrv_co_pdiscard       = blkdebug_co_pdiscard,

     .bdrv_debug_event           = blkdebug_debug_event,
     .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 15/17] blkdebug: Simplify override logic
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (13 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 14/17] blkdebug: Add pass-through write_zero and discard support Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 16/17] blkdebug: Add ability to override unmap geometries Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 17/17] tests: Add coverage for recent block geometry fixes Eric Blake
  16 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid.  This however requires that the struct store
a 64-bit number, rather than a narrower type.  Move the errno
assignment into a label that will be reused from more places in
the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

---
v10: no change
v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: tweak error message, but R-b kept
v5: no change
v4: fix typo in commit message, move errno assignment
v3: new patch
---
 block/blkdebug.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 62d113c..a1501eb 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -38,7 +38,7 @@
 typedef struct BDRVBlkdebugState {
     int state;
     int new_state;
-    int align;
+    uint64_t align;

     /* For blkdebug_refresh_filename() */
     char *config_file;
@@ -353,7 +353,6 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVBlkdebugState *s = bs->opaque;
     QemuOpts *opts;
     Error *local_err = NULL;
-    uint64_t align;
     int ret;

     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
@@ -389,12 +388,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         bs->file->bs->supported_zero_flags;

     /* Set request alignment */
-    align = qemu_opt_get_size(opts, "align", 0);
-    if (align < INT_MAX && is_power_of_2(align)) {
-        s->align = align;
-    } else if (align) {
-        error_setg(errp, "Invalid alignment");
-        ret = -EINVAL;
+    s->align = qemu_opt_get_size(opts, "align", 0);
+    if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
+        error_setg(errp, "Cannot meet constraints with align %" PRIu64,
+                   s->align);
         goto fail_unref;
     }

@@ -402,6 +399,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     goto out;

 fail_unref:
+    ret = -EINVAL;
     bdrv_unref_child(bs, bs->file);
 out:
     if (ret < 0) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 16/17] blkdebug: Add ability to override unmap geometries
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (14 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 15/17] blkdebug: Simplify override logic Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 17/17] tests: Add coverage for recent block geometry fixes Eric Blake
  16 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block, Markus Armbruster

Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v10: no change
v9: rebase to master (dropped #optional tags)
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: more tweaks in docs and error messages
v5: tweak docs regarding max-transfer minimum
v4: relax 512 byte minimum now that blkdebug is byte-based, fix doc typo
v3: improve legibility of bounds checking, improve docs
v2: new patch
---
 qapi/block-core.json | 33 ++++++++++++++++--
 block/blkdebug.c     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 125 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 87fb747..2082242 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2430,8 +2430,33 @@
 #
 # @config:          filename of the configuration file
 #
-# @align:           required alignment for requests in bytes,
-#                   must be power of 2, or 0 for default
+# @align:           required alignment for requests in bytes, must be
+#                   positive power of 2, or 0 for default
+#
+# @max-transfer:    maximum size for I/O transfers in bytes, must be
+#                   positive multiple of @align and of the underlying
+#                   file's request alignment (but need not be a power of
+#                   2), or 0 for default (since 2.10)
+#
+# @opt-write-zero:  preferred alignment for write zero requests in bytes,
+#                   must be positive multiple of @align and of the
+#                   underlying file's request alignment (but need not be a
+#                   power of 2), or 0 for default (since 2.10)
+#
+# @max-write-zero:  maximum size for write zero requests in bytes, must be
+#                   positive multiple of @align, of @opt-write-zero, and of
+#                   the underlying file's request alignment (but need not
+#                   be a power of 2), or 0 for default (since 2.10)
+#
+# @opt-discard:     preferred alignment for discard requests in bytes, must
+#                   be positive multiple of @align and of the underlying
+#                   file's request alignment (but need not be a power of
+#                   2), or 0 for default (since 2.10)
+#
+# @max-discard:     maximum size for discard requests in bytes, must be
+#                   positive multiple of @align, of @opt-discard, and of
+#                   the underlying file's request alignment (but need not
+#                   be a power of 2), or 0 for default (since 2.10)
 #
 # @inject-error:    array of error injection descriptions
 #
@@ -2442,7 +2467,9 @@
 { 'struct': 'BlockdevOptionsBlkdebug',
   'data': { 'image': 'BlockdevRef',
             '*config': 'str',
-            '*align': 'int',
+            '*align': 'int', '*max-transfer': 'int32',
+            '*opt-write-zero': 'int32', '*max-write-zero': 'int32',
+            '*opt-discard': 'int32', '*max-discard': 'int32',
             '*inject-error': ['BlkdebugInjectErrorOptions'],
             '*set-state': ['BlkdebugSetStateOptions'] } }

diff --git a/block/blkdebug.c b/block/blkdebug.c
index a1501eb..4ccdceb 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -39,6 +39,11 @@ typedef struct BDRVBlkdebugState {
     int state;
     int new_state;
     uint64_t align;
+    uint64_t max_transfer;
+    uint64_t opt_write_zero;
+    uint64_t max_write_zero;
+    uint64_t opt_discard;
+    uint64_t max_discard;

     /* For blkdebug_refresh_filename() */
     char *config_file;
@@ -343,6 +348,31 @@ static QemuOptsList runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Required alignment in bytes",
         },
+        {
+            .name = "max-transfer",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum transfer size in bytes",
+        },
+        {
+            .name = "opt-write-zero",
+            .type = QEMU_OPT_SIZE,
+            .help = "Optimum write zero alignment in bytes",
+        },
+        {
+            .name = "max-write-zero",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum write zero size in bytes",
+        },
+        {
+            .name = "opt-discard",
+            .type = QEMU_OPT_SIZE,
+            .help = "Optimum discard alignment in bytes",
+        },
+        {
+            .name = "max-discard",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum discard size in bytes",
+        },
         { /* end of list */ }
     },
 };
@@ -354,6 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     QemuOpts *opts;
     Error *local_err = NULL;
     int ret;
+    uint64_t align;

     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
     qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -387,13 +418,61 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
         bs->file->bs->supported_zero_flags;

-    /* Set request alignment */
+    /* Set alignment overrides */
     s->align = qemu_opt_get_size(opts, "align", 0);
     if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
         error_setg(errp, "Cannot meet constraints with align %" PRIu64,
                    s->align);
         goto fail_unref;
     }
+    align = MAX(s->align, bs->file->bs->bl.request_alignment);
+
+    s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
+    if (s->max_transfer &&
+        (s->max_transfer >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_transfer, align))) {
+        error_setg(errp, "Cannot meet constraints with max-transfer %" PRIu64,
+                   s->max_transfer);
+        goto fail_unref;
+    }
+
+    s->opt_write_zero = qemu_opt_get_size(opts, "opt-write-zero", 0);
+    if (s->opt_write_zero &&
+        (s->opt_write_zero >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->opt_write_zero, align))) {
+        error_setg(errp, "Cannot meet constraints with opt-write-zero %" PRIu64,
+                   s->opt_write_zero);
+        goto fail_unref;
+    }
+
+    s->max_write_zero = qemu_opt_get_size(opts, "max-write-zero", 0);
+    if (s->max_write_zero &&
+        (s->max_write_zero >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_write_zero,
+                          MAX(s->opt_write_zero, align)))) {
+        error_setg(errp, "Cannot meet constraints with max-write-zero %" PRIu64,
+                   s->max_write_zero);
+        goto fail_unref;
+    }
+
+    s->opt_discard = qemu_opt_get_size(opts, "opt-discard", 0);
+    if (s->opt_discard &&
+        (s->opt_discard >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->opt_discard, align))) {
+        error_setg(errp, "Cannot meet constraints with opt-discard %" PRIu64,
+                   s->opt_discard);
+        goto fail_unref;
+    }
+
+    s->max_discard = qemu_opt_get_size(opts, "max-discard", 0);
+    if (s->max_discard &&
+        (s->max_discard >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_discard,
+                          MAX(s->opt_discard, align)))) {
+        error_setg(errp, "Cannot meet constraints with max-discard %" PRIu64,
+                   s->max_discard);
+        goto fail_unref;
+    }

     ret = 0;
     goto out;
@@ -793,6 +872,21 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
     if (s->align) {
         bs->bl.request_alignment = s->align;
     }
+    if (s->max_transfer) {
+        bs->bl.max_transfer = s->max_transfer;
+    }
+    if (s->opt_write_zero) {
+        bs->bl.pwrite_zeroes_alignment = s->opt_write_zero;
+    }
+    if (s->max_write_zero) {
+        bs->bl.max_pwrite_zeroes = s->max_write_zero;
+    }
+    if (s->opt_discard) {
+        bs->bl.pdiscard_alignment = s->opt_discard;
+    }
+    if (s->max_discard) {
+        bs->bl.max_pdiscard = s->max_discard;
+    }
 }

 static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [Qemu-devel] [PATCH v10 17/17] tests: Add coverage for recent block geometry fixes
  2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
                   ` (15 preceding siblings ...)
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 16/17] blkdebug: Add ability to override unmap geometries Eric Blake
@ 2017-04-27  1:46 ` Eric Blake
  16 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-27  1:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Use blkdebug's new geometry constraints to emulate setups that
have caused recent regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 80000001, passed to blkdebug
  blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
    qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
    qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
    qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
    qcow2: discard 1M at 105M, succeeds
    qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---

[yes, I added tests 177 and 179 out of order during this series.
Oh well; I'm tired of renumbering this one.]

v10: no change, rebase to context
v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: rebase to master by renumbering s/175/177/
v5: rebase to master by renumbering s/173/175/
v4: clean up some comments, nicer backing file creation, more commit message
v3: make comments tied more to test at hand, rather than the
particular hardware that led to the earlier patches being tested
v2: new patch
---
 tests/qemu-iotests/177     | 114 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/177.out |  49 +++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 164 insertions(+)
 create mode 100755 tests/qemu-iotests/177
 create mode 100644 tests/qemu-iotests/177.out

diff --git a/tests/qemu-iotests/177 b/tests/qemu-iotests/177
new file mode 100755
index 0000000..e4ddec7
--- /dev/null
+++ b/tests/qemu-iotests/177
@@ -0,0 +1,114 @@
+#!/bin/bash
+#
+# Test corner cases with unusual block geometries
+#
+# Copyright (C) 2016-2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=eblake@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+
+CLUSTER_SIZE=1M
+size=128M
+options=driver=blkdebug,image.driver=qcow2
+
+echo
+echo "== setting up files =="
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+$QEMU_IO -c "write -P 11 0 $size" "$TEST_IMG.base" | _filter_qemu_io
+_make_test_img -b "$TEST_IMG.base"
+$QEMU_IO -c "write -P 22 0 $size" "$TEST_IMG" | _filter_qemu_io
+
+# Limited to 64k max-transfer
+echo
+echo "== constrained alignment and max-transfer =="
+limits=align=4k,max-transfer=64k
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -P 33 1000 128k" -c "read -P 33 1000 128k" | _filter_qemu_io
+
+echo
+echo "== write zero with constrained max-transfer =="
+limits=align=512,max-transfer=64k,opt-write-zero=$CLUSTER_SIZE
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -z 8003584 2093056" | _filter_qemu_io
+
+# non-power-of-2 write-zero/discard alignments
+echo
+echo "== non-power-of-2 write zeroes limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -z 32M 32M" | _filter_qemu_io
+
+echo
+echo "== non-power-of-2 discard limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "discard 80000001 30M" | _filter_qemu_io
+
+echo
+echo "== verify image content =="
+
+function verify_io()
+{
+    if ($QEMU_IMG info -f "$IMGFMT" "$TEST_IMG" |
+	    grep "compat: 0.10" > /dev/null); then
+        # For v2 images, discarded clusters are read from the backing file
+        discarded=11
+    else
+        # Discarded clusters are zeroed for v3 or later
+        discarded=0
+    fi
+
+    echo read -P 22 0 1000
+    echo read -P 33 1000 128k
+    echo read -P 22 132072 7871512
+    echo read -P 0 8003584 2093056
+    echo read -P 22 10096640 23457792
+    echo read -P 0 32M 32M
+    echo read -P 22 64M 13M
+    echo read -P $discarded 77M 29M
+    echo read -P 22 106M 22M
+}
+
+verify_io | $QEMU_IO "$TEST_IMG" | _filter_qemu_io
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+status=0
diff --git a/tests/qemu-iotests/177.out b/tests/qemu-iotests/177.out
new file mode 100644
index 0000000..e887542
--- /dev/null
+++ b/tests/qemu-iotests/177.out
@@ -0,0 +1,49 @@
+QA output created by 177
+
+== setting up files ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
+wrote 134217728/134217728 bytes at offset 0
+128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 134217728/134217728 bytes at offset 0
+128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== constrained alignment and max-transfer ==
+wrote 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== write zero with constrained max-transfer ==
+wrote 2093056/2093056 bytes at offset 8003584
+1.996 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== non-power-of-2 write zeroes limits ==
+wrote 33554432/33554432 bytes at offset 33554432
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== non-power-of-2 discard limits ==
+discard 31457280/31457280 bytes at offset 80000001
+30 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify image content ==
+read 1000/1000 bytes at offset 0
+1000 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 7871512/7871512 bytes at offset 132072
+7.507 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2093056/2093056 bytes at offset 8003584
+1.996 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 23457792/23457792 bytes at offset 10096640
+22.371 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 33554432/33554432 bytes at offset 33554432
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 13631488/13631488 bytes at offset 67108864
+13 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 30408704/30408704 bytes at offset 80740352
+29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 23068672/23068672 bytes at offset 111149056
+22 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 11ba1d9..395c72a 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -169,5 +169,6 @@
 174 auto
 175 auto quick
 176 rw auto backing
+177 rw auto quick
 179 rw auto quick
 181 rw auto migration
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
@ 2017-04-28 17:12   ` Max Reitz
  2017-04-28 20:18   ` Eric Blake
  1 sibling, 0 replies; 40+ messages in thread
From: Max Reitz @ 2017-04-28 17:12 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> We had some conflicting documentation: a nice 8-way table that
> described all possible combinations of DATA, ZERO, and
> OFFSET_VALID, couple with text that implied that OFFSET_VALID
> always meant raw data could be read directly.  As the 8-way
> table is the intended semantics, simplify the rest of the text
> to get rid of the confusion.
> 
> Suggested-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: new patch
> ---
>  include/block/block.h | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>

Thanks!


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters Eric Blake
@ 2017-04-28 17:35   ` Max Reitz
  2017-04-28 19:04     ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Max Reitz @ 2017-04-28 17:35 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5028 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> We were throwing away the preallocation information associated with
> zero clusters.  But we should be matching the well-defined semantics
> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
> while still reminding the user that reading from that offset is
> likely to read garbage.
> 
> Making this change lets us see which portions of an image are zero
> but preallocated, when using qemu-img map --output=json.  The
> --output=human side intentionally ignores all zero clusters, whether
> or not they are preallocated.
> 
> The fact that there is no change to qemu-iotests './check -qcow2'
> merely means that we aren't yet testing this aspect of qemu-img;
> a later patch will add a test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: new patch
> ---
>  block/qcow2-cluster.c | 32 +++++++++++++++++++++++++++-----
>  1 file changed, 27 insertions(+), 5 deletions(-)

I'd propose you split the qcow2 changes off of this series.

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 100398c..d1063df 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -328,6 +328,10 @@ static int count_contiguous_clusters(int nb_clusters, int cluster_size,
>  	return i;
>  }
> 
> +/*
> + * Checks how many consecutive clusters in a given L2 table have the same
> + * cluster type with no corresponding allocation.
> + */
>  static int count_contiguous_clusters_by_type(int nb_clusters,
>                                               uint64_t *l2_table,
>                                               int wanted_type)
> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int nb_clusters,
>      int i;
> 
>      for (i = 0; i < nb_clusters; i++) {
> -        int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
> +        uint64_t entry = be64_to_cpu(l2_table[i]);
> +        int type = qcow2_get_cluster_type(entry);
> 
> -        if (type != wanted_type) {
> +        if (type != wanted_type || entry & L2E_OFFSET_MASK) {

This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
what the comment you added describes, but this may still warrant a
renaming of this function (count_contiguous_unallocated_clusters?) and
probably an assertion that wanted_type is ZERO or UNALLOCATED.

>              break;
>          }
>      }
> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>              ret = -EIO;
>              goto fail;
>          }
> -        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
> -                                              QCOW2_CLUSTER_ZERO);
> -        *cluster_offset = 0;
> +        /* Distinguish between pure zero clusters and pre-allocated ones */
> +        if (*cluster_offset & L2E_OFFSET_MASK) {
> +            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
> +                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);

You should probably switch this patch and the next one -- or I just send
my patches myself and you base your series on top of it...

Because before the next patch, this happens:

$ ./qemu-img create -f qcow2 foo.qcow2 64M
Formatting 'foo.qcow2', fmt=qcow2 size=67108864 encryption=off
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ ./qemu-io -c 'write 0 64M' -c 'write -z 0 64M' foo.qcow2
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.1846 sec (346.590 MiB/sec and 5.4155 ops/sec)
wrote 67108864/67108864 bytes at offset 0
64 MiB, 1 ops; 0.0004 sec (127.551 GiB/sec and 2040.8163 ops/sec)
$ ./qemu-img map foo.qcow2
Offset          Length          Mapped to       File
qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
failed.
[1]    4970 abort (core dumped)  ./qemu-img map foo.qcow2

Max

> +            *cluster_offset &= L2E_OFFSET_MASK;
> +            if (offset_into_cluster(s, *cluster_offset)) {
> +                qcow2_signal_corruption(bs, true, -1, -1,
> +                                        "Preallocated zero cluster offset %#"
> +                                        PRIx64 " unaligned (L2 offset: %#"
> +                                        PRIx64 ", L2 index: %#x)",
> +                                        *cluster_offset, l2_offset, l2_index);
> +                ret = -EIO;
> +                goto fail;
> +            }
> +        } else {
> +            c = count_contiguous_clusters_by_type(nb_clusters,
> +                                                  &l2_table[l2_index],
> +                                                  QCOW2_CLUSTER_ZERO);
> +            *cluster_offset = 0;
> +        }
>          break;
>      case QCOW2_CLUSTER_UNALLOCATED:
>          /* how many empty clusters ? */
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 03/17] qcow2: Reuse preallocated zero clusters
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 03/17] qcow2: Reuse " Eric Blake
@ 2017-04-28 17:51   ` Max Reitz
  0 siblings, 0 replies; 40+ messages in thread
From: Max Reitz @ 2017-04-28 17:51 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5942 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> From: Max Reitz <mreitz@redhat.com>
> 
> Instead of just freeing preallocated zero clusters and completely
> allocating them from scratch, reuse them.
> 
> We cannot do this in handle_copied(), however, since this is a COW
> operation. Therefore, we have to add the new logic to handle_alloc() and
> simply return the existing offset if it exists. The only catch is that
> we have to convince qcow2_alloc_cluster_link_l2() not to free the old
> clusters (because we have reused them).
> 
> Reported-by: Eric Blake <eblake@redhat.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: new patch. Max hasn't posted the patch directly on list, but
> did mention it here:
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03936.html
> and that he would post it once he has tests. Well, my later patches
> add a test that requires this one :)  The other two patches that
> he mentioned there are also good, but not essential for my series
> (and I didn't want to write tests to expose the behavior difference,
> because it would deprive Max of that fun).

Well, the main reason I didn't send the patches yet is because I was
tired while writing them ("Date: Sat Apr 22 01:17:52 2017 +0200") and I
wanted to take another look before sending them. I guess now's as good a
time as any.

> ---
>  block/qcow2.h         |  3 ++
>  block/qcow2-cluster.c | 83 +++++++++++++++++++++++++++++++++++----------------
>  2 files changed, 60 insertions(+), 26 deletions(-)

[...]

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index d1063df..db3d937 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c

[...]

> @@ -1192,31 +1199,53 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>       * wrong with our code. */
>      assert(nb_clusters > 0);
> 
> +    if (!*host_offset && qcow2_get_cluster_type(entry) == QCOW2_CLUSTER_ZERO &&
> +        (entry & L2E_OFFSET_MASK) != 0 && (entry & QCOW_OFLAG_COPIED))

*host_offset works with this, too, if
start_of_cluster(s, *host_offset) == (entry & L2E_OFFSET_MASK).

If !(entry & QCOW_OFLAG_COPIED), we should check whether the refcount
maybe is 1 and then set OFLAG_COPIED. But that is something we don't
even do for normal clusters yet, so it's something to fix another day.

> +    {
> +        /* Try to reuse preallocated zero clusters; contiguous normal clusters
> +         * would be fine, too, but count_cow_clusters() above has limited
> +         * nb_clusters already to a range of COW clusters */
> +        int preallocated_nb_clusters =
> +            count_contiguous_clusters(nb_clusters, s->cluster_size, l2_table,

s/l2_table/&l2_table[l2_index]/

> +                                      QCOW_OFLAG_COPIED);
> +
> +        if (preallocated_nb_clusters) {

preallocated_nb_clusters must be at least 1, so an assertion would be
better.

Max

> +            nb_clusters = preallocated_nb_clusters;
> +            alloc_cluster_offset = entry & L2E_OFFSET_MASK;
> +
> +            /* We want to reuse these clusters, so qcow2_alloc_cluster_link_l2()
> +             * should not free them. */
> +            keep_old_clusters = true;
> +        }
> +    }
> +
>      qcow2_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
> 
> -    /* Allocate, if necessary at a given offset in the image file */
> -    alloc_cluster_offset = start_of_cluster(s, *host_offset);
> -    ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
> -                                  &nb_clusters);
> -    if (ret < 0) {
> -        goto fail;
> -    }
> -
> -    /* Can't extend contiguous allocation */
> -    if (nb_clusters == 0) {
> -        *bytes = 0;
> -        return 0;
> -    }
> -
> -    /* !*host_offset would overwrite the image header and is reserved for "no
> -     * host offset preferred". If 0 was a valid host offset, it'd trigger the
> -     * following overlap check; do that now to avoid having an invalid value in
> -     * *host_offset. */
>      if (!alloc_cluster_offset) {
> -        ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
> -                                            nb_clusters * s->cluster_size);
> -        assert(ret < 0);
> -        goto fail;
> +        /* Allocate, if necessary at a given offset in the image file */
> +        alloc_cluster_offset = start_of_cluster(s, *host_offset);
> +        ret = do_alloc_cluster_offset(bs, guest_offset, &alloc_cluster_offset,
> +                                      &nb_clusters);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +
> +        /* Can't extend contiguous allocation */
> +        if (nb_clusters == 0) {
> +            *bytes = 0;
> +            return 0;
> +        }
> +
> +        /* !*host_offset would overwrite the image header and is reserved for
> +         * "no host offset preferred". If 0 was a valid host offset, it'd
> +         * trigger the following overlap check; do that now to avoid having an
> +         * invalid value in *host_offset. */
> +        if (!alloc_cluster_offset) {
> +            ret = qcow2_pre_write_overlap_check(bs, 0, alloc_cluster_offset,
> +                                                nb_clusters * s->cluster_size);
> +            assert(ret < 0);
> +            goto fail;
> +        }
>      }
> 
>      /*
> @@ -1247,6 +1276,8 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
>          .offset         = start_of_cluster(s, guest_offset),
>          .nb_clusters    = nb_clusters,
> 
> +        .keep_old_clusters  = keep_old_clusters,
> +
>          .cow_start = {
>              .offset     = 0,
>              .nb_bytes   = offset_into_cluster(s, guest_offset),
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
@ 2017-04-28 18:00   ` Max Reitz
  2017-04-28 19:11     ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Max Reitz @ 2017-04-28 18:00 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3488 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> Similar to discard_single_l2(), we should try to avoid dirtying
> the L2 cache when the cluster we are changing already has the
> right characteristics.
> 
> Note that by the time we get to zero_single_l2(), BDRV_REQ_MAY_UNMAP
> is a requirement to unallocate a cluster (this is because the block
> layer clears that flag if discard.* flags during open requested that
> we never punch holes - see the conversation around commit 170f4b2e,
> https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg07306.html).
> Therefore, this patch can only reuse a zero cluster as-is if either
> unmapping is not requested, or if the zero cluster was not associated
> with an allocation.
> 
> Technically, there are some cases where an unallocated cluster
> already reads as all zeroes (namely, when there is no backing file
> [easy: check bs->backing], or when the backing file also reads as
> zeroes [harder: we can't check bdrv_get_block_status since we are
> already holding the lock]), where the guest would not immediately see
> a difference if we left that cluster unallocated.  But if the user
> did not request unmapping, leaving an unallocated cluster is wrong;
> and even if the user DID request unmapping, keeping a cluster
> unallocated risks a subtle semantic change of guest-visible contents
> if a backing file is later added, and it is not worth auditing
> whether all internal uses such as mirror properly avoid an unmap
> request.  Thus, this patch is intentionally limited to just clusters
> that are already marked as zero.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: new patch, replacing earlier attempt to use unallocated clusters,
> and ditching any optimization of v2 files
> ---
>  block/qcow2-cluster.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index db3d937..d542894 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1614,6 +1614,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>      int l2_index;
>      int ret;
>      int i;
> +    bool unmap = !!(flags & BDRV_REQ_MAY_UNMAP);
> 
>      ret = get_cluster_table(bs, offset, &l2_table, &l2_index);
>      if (ret < 0) {
> @@ -1629,9 +1630,17 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
> 
>          old_offset = be64_to_cpu(l2_table[l2_index + i]);
> 
> -        /* Update L2 entries */
> +        /*
> +         * Minimize L2 changes if the cluster already reads back as
> +         * zeroes with correct allocation.
> +         */
> +        if (qcow2_get_cluster_type(old_offset) == QCOW2_CLUSTER_ZERO &&
> +            !(unmap && old_offset & L2E_OFFSET_MASK)) {
> +            continue;
> +        }
> +
>          qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
> -        if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
> +        if (old_offset & QCOW_OFLAG_COMPRESSED || unmap) {

Works, but I'd like it better to store the cluster type somewhere and
then use it here instead of checking the COMPRESSED flag.

But it works, so it's up to you:

Reviewed-by: Max Reitz <mreitz@redhat.com>

>              l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>              qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>          } else {
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters
  2017-04-28 17:35   ` Max Reitz
@ 2017-04-28 19:04     ` Eric Blake
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-28 19:04 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 4169 bytes --]

On 04/28/2017 12:35 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We were throwing away the preallocation information associated with
>> zero clusters.  But we should be matching the well-defined semantics
>> in bdrv_get_block_status(), where (BDRV_BLOCK_ZERO |
>> BDRV_BLOCK_OFFSET_VALID) informs the user which offset is reserved,
>> while still reminding the user that reading from that offset is
>> likely to read garbage.
>>
>> Making this change lets us see which portions of an image are zero
>> but preallocated, when using qemu-img map --output=json.  The
>> --output=human side intentionally ignores all zero clusters, whether
>> or not they are preallocated.
>>
>> The fact that there is no change to qemu-iotests './check -qcow2'
>> merely means that we aren't yet testing this aspect of qemu-img;
>> a later patch will add a test.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---
>> v10: new patch
>> ---
>>  block/qcow2-cluster.c | 32 +++++++++++++++++++++++++++-----
>>  1 file changed, 27 insertions(+), 5 deletions(-)
> 
> I'd propose you split the qcow2 changes off of this series.

Yeah, it's sort of morphed into a well-tested later part and an earlier
part that is still undergoing heavy review.  I don't know how much churn
there would be to rebase things to do the blkdebug changes first (it may
invalidate portions of my test 177; but I could always tag such spots as
FIXME while waiting on the qcow2 part to settle and land).  I'll see
what I can do, as I've now got multiple series queued up, and should at
least get SOMETHING to land to start clearing up the logjam.


>> @@ -335,9 +339,10 @@ static int count_contiguous_clusters_by_type(int nb_clusters,
>>      int i;
>>
>>      for (i = 0; i < nb_clusters; i++) {
>> -        int type = qcow2_get_cluster_type(be64_to_cpu(l2_table[i]));
>> +        uint64_t entry = be64_to_cpu(l2_table[i]);
>> +        int type = qcow2_get_cluster_type(entry);
>>
>> -        if (type != wanted_type) {
>> +        if (type != wanted_type || entry & L2E_OFFSET_MASK) {
> 
> This works only for wanted_type \in \{ ZERO, UNALLOCATED \} -- which is
> what the comment you added describes, but this may still warrant a
> renaming of this function (count_contiguous_unallocated_clusters?) and
> probably an assertion that wanted_type is ZERO or UNALLOCATED.

Good idea.


>> @@ -559,9 +564,26 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>>              ret = -EIO;
>>              goto fail;
>>          }
>> -        c = count_contiguous_clusters_by_type(nb_clusters, &l2_table[l2_index],
>> -                                              QCOW2_CLUSTER_ZERO);
>> -        *cluster_offset = 0;
>> +        /* Distinguish between pure zero clusters and pre-allocated ones */
>> +        if (*cluster_offset & L2E_OFFSET_MASK) {
>> +            c = count_contiguous_clusters(nb_clusters, s->cluster_size,
>> +                                          &l2_table[l2_index], QCOW_OFLAG_ZERO);
> 
> You should probably switch this patch and the next one -- or I just send
> my patches myself and you base your series on top of it...

I tested with your patches in, then rebased with your patches out to see
what broke, and...

> 
> Because before the next patch, this happens:
> 

> $ ./qemu-img map foo.qcow2
> Offset          Length          Mapped to       File
> qemu-img: block/qcow2-cluster.c:319: count_contiguous_clusters:
> Assertion `qcow2_get_cluster_type(first_entry) == QCOW2_CLUSTER_NORMAL'
> failed.

got that same assert. So I realized that your patch was necessary and
rebased it back in, but obviously forgot to re-test at all points in the
series. Yes, your should go first, and I'd be more than happy if you
post yours formally and I rebase my qcow2 portions on top of yours (all
the more reason for me to split this series into two, and get the
blkdebug portions in now while we still hammer on the qcow2 stuff).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn
  2017-04-28 18:00   ` Max Reitz
@ 2017-04-28 19:11     ` Eric Blake
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-28 19:11 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]

On 04/28/2017 01:00 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> Similar to discard_single_l2(), we should try to avoid dirtying
>> the L2 cache when the cluster we are changing already has the
>> right characteristics.
>>

>> -        /* Update L2 entries */
>> +        /*
>> +         * Minimize L2 changes if the cluster already reads back as
>> +         * zeroes with correct allocation.
>> +         */
>> +        if (qcow2_get_cluster_type(old_offset) == QCOW2_CLUSTER_ZERO &&
>> +            !(unmap && old_offset & L2E_OFFSET_MASK)) {
>> +            continue;
>> +        }
>> +
>>          qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
>> -        if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
>> +        if (old_offset & QCOW_OFLAG_COMPRESSED || unmap) {
> 
> Works, but I'd like it better to store the cluster type somewhere and
> then use it here instead of checking the COMPRESSED flag.

Indeed, qcow2_get_cluster_type() checks QCOW_OFLAG_COMPRESSED first -
but I like the idea of storing it in a temporary variable now that we
are using the cluster type in more than one place.

> 
> But it works, so it's up to you:
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
>>              l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>>              qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
>>          } else {
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
@ 2017-04-28 19:28   ` Max Reitz
  2017-04-28 19:48     ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Max Reitz @ 2017-04-28 19:28 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> No tests were covering write zeroes with unmap.  Additionally,
> I needed to prove that my previous patches for correct status
> reporting and write zeroes optimizations actually had an impact.
> 
> The test works for cluster_size between 8k and 2M (for smaller
> sizes, it fails because our allocation patterns are not contiguous
> with small clusters).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: drop any changes to v2 files, rewrite test to work with updates
> earlier in the series, add a blkdebug probe
> v9: new patch
> ---
>  tests/qemu-iotests/179     | 123 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/179.out |  90 +++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 214 insertions(+)
>  create mode 100755 tests/qemu-iotests/179
>  create mode 100644 tests/qemu-iotests/179.out

Can I convince you to write this test in a different way? I really have
a hard time following all of the writes and building a mental image of
how the qcow2 file should look in the end so I can compare it to the map
output.

I'd like it better if you would check the qcow2 file after each group of
write operations, maybe even creating a new image for every such group.
But even just having more qemu-img map calls (and changing nothing else)
would help me a lot.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED Eric Blake
@ 2017-04-28 19:30   ` Max Reitz
  0 siblings, 0 replies; 40+ messages in thread
From: Max Reitz @ 2017-04-28 19:30 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> Manual comparison against 0x1ff is not as clean as using our
> alignment macros from osdep.h.
> 
> Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: new patch
> ---
>  qemu-io-cmds.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
@ 2017-04-28 19:46   ` Max Reitz
  2017-04-28 19:59     ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Max Reitz @ 2017-04-28 19:46 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> For the 'alloc' command, accepting an offset in bytes but a length
> in sectors, and reporting output in sectors, is confusing.  Do
> everything in bytes, and adjust the expected output accordingly.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> 
> ---
> v10: rebase to code cleanup
> v9: new patch
> ---
>  qemu-io-cmds.c                    | 30 ++++++++++++++++++------------
>  tests/qemu-iotests/019.out        |  8 ++++----
>  tests/qemu-iotests/common.pattern |  2 +-
>  3 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index fabc394..34f6707 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1760,7 +1760,7 @@ out:
>  static int alloc_f(BlockBackend *blk, int argc, char **argv)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> -    int64_t offset, sector_num, nb_sectors, remaining;
> +    int64_t offset, sector_num, nb_sectors, remaining, bytes;
>      char s1[64];
>      int num, ret;
>      int64_t sum_alloc;
> @@ -1776,18 +1776,24 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
>      }
> 
>      if (argc == 3) {
> -        nb_sectors = cvtnum(argv[2]);
> -        if (nb_sectors < 0) {
> -            print_cvtnum_err(nb_sectors, argv[2]);
> +        bytes = cvtnum(argv[2]);
> +        if (bytes < 0) {
> +            print_cvtnum_err(bytes, argv[2]);
>              return 0;
> -        } else if (nb_sectors > INT_MAX) {
> -            printf("length argument cannot exceed %d, given %s\n",
> -                   INT_MAX, argv[2]);
> +        } else if (bytes > INT_MAX * BDRV_SECTOR_SIZE) {
> +            printf("length argument cannot exceed %llu, given %s\n",
> +                   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
>              return 0;
>          }
>      } else {
> -        nb_sectors = 1;
> +        bytes = BDRV_SECTOR_SIZE;
>      }
> +    if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
> +        printf("bytes %" PRId64 " is not sector aligned\n",

This isn't real English. :-)

With that fixed (somehow, you know better than me how to):

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +               bytes);
> +        return 0;
> +    }
> +    nb_sectors = bytes >> BDRV_SECTOR_BITS;
> 
>      remaining = nb_sectors;
>      sum_alloc = 0;


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap
  2017-04-28 19:28   ` Max Reitz
@ 2017-04-28 19:48     ` Eric Blake
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-28 19:48 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]

On 04/28/2017 02:28 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> No tests were covering write zeroes with unmap.  Additionally,
>> I needed to prove that my previous patches for correct status
>> reporting and write zeroes optimizations actually had an impact.
>>
>> The test works for cluster_size between 8k and 2M (for smaller
>> sizes, it fails because our allocation patterns are not contiguous
>> with small clusters).
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---
>> v10: drop any changes to v2 files, rewrite test to work with updates
>> earlier in the series, add a blkdebug probe
>> v9: new patch
>> ---
>>  tests/qemu-iotests/179     | 123 +++++++++++++++++++++++++++++++++++++++++++++
>>  tests/qemu-iotests/179.out |  90 +++++++++++++++++++++++++++++++++
>>  tests/qemu-iotests/group   |   1 +
>>  3 files changed, 214 insertions(+)
>>  create mode 100755 tests/qemu-iotests/179
>>  create mode 100644 tests/qemu-iotests/179.out
> 
> Can I convince you to write this test in a different way? I really have
> a hard time following all of the writes and building a mental image of
> how the qcow2 file should look in the end so I can compare it to the map
> output.
> 
> I'd like it better if you would check the qcow2 file after each group of
> write operations, maybe even creating a new image for every such group.
> But even just having more qemu-img map calls (and changing nothing else)
> would help me a lot.

Will do.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
@ 2017-04-28 19:53   ` Max Reitz
  2017-04-28 20:03     ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Max Reitz @ 2017-04-28 19:53 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> Mixing byte offset and sector allocation counts is a bit
> confusing.  Also, reporting n/m sectors, where m decreases
> according to the remaining size of the file, isn't really
> adding any useful information.

Since this map doesn't leave out any range in the image file -- not
really, no. :-)

>                                 Update the output to use
> byte counts, and adjust the affected tests (./check -qcow2 102,
> ./check -vpc 146).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: rebase to updated test 179
> v9: new patch
> ---
>  qemu-io-cmds.c             |  5 ++---
>  tests/qemu-iotests/102.out |  4 ++--
>  tests/qemu-iotests/146.out | 30 +++++++++++++++---------------
>  tests/qemu-iotests/179.out | 22 +++++++++++-----------
>  4 files changed, 30 insertions(+), 31 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-28 19:46   ` Max Reitz
@ 2017-04-28 19:59     ` Eric Blake
  2017-04-28 20:09       ` Max Reitz
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-28 19:59 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]

On 04/28/2017 02:46 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> For the 'alloc' command, accepting an offset in bytes but a length
>> in sectors, and reporting output in sectors, is confusing.  Do
>> everything in bytes, and adjust the expected output accordingly.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>

>>      }
>> +    if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>> +        printf("bytes %" PRId64 " is not sector aligned\n",
> 
> This isn't real English. :-)

But, it's just copy-and-paste from the other instances you just reviewed
in 6/17!  [Translation - if I change this one, I also get to redo that one]

Which of these various alternatives (if any) looks better:

bytes=511 is not sector-aligned
511 is not a sector-aligned value for 'bytes'
requested 'bytes' of 511 is not sector-aligned
alignment error: 511 bytes is not sector-aligned
'bytes' must be sector-aligned: 511
your clever entry here...

> 
> With that fixed (somehow, you know better than me how to):

Re-reading my various alternatives, I do think that /sector
aligned/sector-aligned/ helps no matter what; and that the remaining
trick is to use quoting or '=' or some other lexical trick to make it
obvious that 'bytes' is a parameter name whose value 511 is invalid,
rather than part of the actual error of a value that is not properly
aligned.

> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>

If you state a preference for one of my variants, then the respin will
use that variant consistently and add your R-b.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting
  2017-04-28 19:53   ` Max Reitz
@ 2017-04-28 20:03     ` Eric Blake
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-28 20:03 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

On 04/28/2017 02:53 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> Mixing byte offset and sector allocation counts is a bit
>> confusing.  Also, reporting n/m sectors, where m decreases
>> according to the remaining size of the file, isn't really
>> adding any useful information.
> 
> Since this map doesn't leave out any range in the image file -- not
> really, no. :-)
> 
>>                                 Update the output to use
>> byte counts, and adjust the affected tests (./check -qcow2 102,
>> ./check -vpc 146).

My commit message is stale if test 179 gets committed first (although,
as we mentioned earlier, I'm trying to split this series in two, which
would delay the introduction of 179...)

I still wonder if an even more concise representation is worth it; the
resulting output lines are still quite long.  We don't make any
guarantees about qemu-io back-compat, so I may still tweak it further
(but if I do, I'll drop R-b)

I should probably also add a blurb to the commit message about how
'qemu-img map' and 'qemu-io map' are two different beasts.

>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---
>> v10: rebase to updated test 179
>> v9: new patch
>> ---
>>  qemu-io-cmds.c             |  5 ++---
>>  tests/qemu-iotests/102.out |  4 ++--
>>  tests/qemu-iotests/146.out | 30 +++++++++++++++---------------
>>  tests/qemu-iotests/179.out | 22 +++++++++++-----------
>>  4 files changed, 30 insertions(+), 31 deletions(-)
> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-28 19:59     ` Eric Blake
@ 2017-04-28 20:09       ` Max Reitz
  2017-04-28 20:36         ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Max Reitz @ 2017-04-28 20:09 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]

On 28.04.2017 21:59, Eric Blake wrote:
> On 04/28/2017 02:46 PM, Max Reitz wrote:
>> On 27.04.2017 03:46, Eric Blake wrote:
>>> For the 'alloc' command, accepting an offset in bytes but a length
>>> in sectors, and reporting output in sectors, is confusing.  Do
>>> everything in bytes, and adjust the expected output accordingly.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>>
> 
>>>      }
>>> +    if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>>> +        printf("bytes %" PRId64 " is not sector aligned\n",
>>
>> This isn't real English. :-)
> 
> But, it's just copy-and-paste from the other instances you just reviewed
> in 6/17!  [Translation - if I change this one, I also get to redo that one]

No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)

> 
> Which of these various alternatives (if any) looks better:
> 
> bytes=511 is not sector-aligned
> 511 is not a sector-aligned value for 'bytes'
> requested 'bytes' of 511 is not sector-aligned
> alignment error: 511 bytes is not sector-aligned> 'bytes' must be sector-aligned: 511
> your clever entry here...

How about "byte count" instead of "bytes" or "bytes value", if really
want to have the exact spelling in there?

For your entries above: (1) and (2) work for me (I like (2) a bit
better), (3) doesn't sound like real English either, and it should be
s/is/are/ in (4) (but it still sounds off with that change). (5) I
mostly dislike because I dislike error message of the form "This should
be X: $foo", I like "$foo is not X" better.

>> With that fixed (somehow, you know better than me how to):
> 
> Re-reading my various alternatives, I do think that /sector
> aligned/sector-aligned/ helps no matter what; and that the remaining
> trick is to use quoting or '=' or some other lexical trick to make it
> obvious that 'bytes' is a parameter name whose value 511 is invalid,
> rather than part of the actual error of a value that is not properly
> aligned.

First of all, I think the user is clever enough to figure out that a
description like "byte count" refers to the "bytes" parameter. ;-)

Secondly, this is even more true because this is a debugging tool so we
don't have to worry too much about... Let's say inexperienced users.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
  2017-04-28 17:12   ` Max Reitz
@ 2017-04-28 20:18   ` Eric Blake
  1 sibling, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-04-28 20:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 2929 bytes --]

On 04/26/2017 08:46 PM, Eric Blake wrote:
> We had some conflicting documentation: a nice 8-way table that
> described all possible combinations of DATA, ZERO, and
> OFFSET_VALID, couple with text that implied that OFFSET_VALID
> always meant raw data could be read directly.  As the 8-way
> table is the intended semantics, simplify the rest of the text
> to get rid of the confusion.
> 
> Suggested-by: Max Reitz <mreitz@redhat.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---

>  /*
>   * Allocation status flags
> - * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_get_block_status.
> + * BDRV_BLOCK_DATA: data is read from a file returned by bdrv_get_block_status

I guess we always return the BDS where the data is read from, even when
we can't actually provide an offset to that data (such as when the
actual data is encrypted or compressed, and therefore has no raw
offset).  But I'm still wondering if this line can be shortened.

>   * BDRV_BLOCK_ZERO: sectors read as zero
> - * BDRV_BLOCK_OFFSET_VALID: sector stored as raw data in a file returned by
> - *                          bdrv_get_block_status.

I think when this was originally written, we blindly assumed that offset
pointed into bs->file.  Then with the exposure of the BDS graph, we
changed bdrv_get_block_status() to have a BlockDriverState **file
parameter (offset points into the returned file), since bs->file need
not always be the right BDS.  This old comment is on track...

> + * BDRV_BLOCK_OFFSET_VALID: an associated offset exists for accessing raw data
>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>   *                       layer (as opposed to the backing file)
>   * BDRV_BLOCK_RAW: used internally to indicate that the request
>   *                 was answered by the raw driver and that one
>   *                 should look in bs->file directly.
>   *
> - * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
> - * bs->file where sector data can be read from as raw data.

...but this one is stale...

> - *
>   * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
>   *
> + * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset
> + * in bs->file that is allocated for the corresponding raw data;

...and I reused the stale wording.  Oops.

> + * however, whether that offset actually contains data also depends on
> + * BDRV_BLOCK_DATA, as follows:
> + *
>   * DATA ZERO OFFSET_VALID
>   *  t    t        t       sectors read as zero, bs->file is zero at offset
>   *  t    f        t       sectors read as valid from bs->file at offset

And these references to bs->file are stale too.

Guess I have more to fix on the respin.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-28 20:09       ` Max Reitz
@ 2017-04-28 20:36         ` Eric Blake
  2017-04-28 20:52           ` Max Reitz
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-28 20:36 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 2682 bytes --]

On 04/28/2017 03:09 PM, Max Reitz wrote:
> On 28.04.2017 21:59, Eric Blake wrote:
>> On 04/28/2017 02:46 PM, Max Reitz wrote:
>>> On 27.04.2017 03:46, Eric Blake wrote:
>>>> For the 'alloc' command, accepting an offset in bytes but a length
>>>> in sectors, and reporting output in sectors, is confusing.  Do
>>>> everything in bytes, and adjust the expected output accordingly.
>>>>
>>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>>>
>>
>>>>      }
>>>> +    if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>>>> +        printf("bytes %" PRId64 " is not sector aligned\n",
>>>
>>> This isn't real English. :-)
>>
>> But, it's just copy-and-paste from the other instances you just reviewed
>> in 6/17!  [Translation - if I change this one, I also get to redo that one]
> 
> No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)

Then an obvious solution: s/bytes/count/ in the parameter name :)

But I still get to redo those, to add the '-' in 'sector-aligned'.

> 
>>
>> Which of these various alternatives (if any) looks better:
>>
>> bytes=511 is not sector-aligned
>> 511 is not a sector-aligned value for 'bytes'
>> requested 'bytes' of 511 is not sector-aligned
>> alignment error: 511 bytes is not sector-aligned
>> 'bytes' must be sector-aligned: 511
>> your clever entry here...
> 
> How about "byte count" instead of "bytes" or "bytes value", if really
> want to have the exact spelling in there?
> 
> For your entries above: (1) and (2) work for me (I like (2) a bit
> better), (3) doesn't sound like real English either, and it should be
> s/is/are/ in (4) (but it still sounds off with that change). (5) I
> mostly dislike because I dislike error message of the form "This should
> be X: $foo", I like "$foo is not X" better.

Maybe this variation of (3) solves the singular/plural disconnect:

request of 511 for 'bytes' is not sector-aligned

which makes it obvious that the "request of 511" (singular) and not the
parameter name (whether singular 'count' or plural 'bytes') is the
subject.  But it's a bit wordier than (2).  So it looks like (2) may be
a winner in all the situations.  But I also think you convinced me to
rename the command parameter; in my next spin, the help text will read:

alloc offset [count] -- checks if offset is allocated in the file

which starts to be non-trivial enough to drop R-b that you were willing
to give for just an error message wording change.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
@ 2017-04-28 20:48   ` Max Reitz
  2017-04-28 21:24     ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Max Reitz @ 2017-04-28 20:48 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 15544 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> We've already improved discards to operate efficiently on the tail
> of an unaligned qcow2 image; it's time to make a similar improvement
> to write zeroes.  The special case is only valid at the tail
> cluster of a file, where we must recognize that any sectors beyond
> the image end would implicitly read as zero, and therefore should
> not penalize our logic for widening a partial cluster into writing
> the whole cluster as zero.
> 
> Update test 154 to cover the new scenarios, using two images of
> intentionally differing length.
> 
> While at it, fix the test to gracefully skip when run as
> ./check -qcow2 -o compat=0.10 154
> since the older format lacks zero clusters already required earlier
> in the test.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: rebase to better reporting of preallocated zero clusters
> v9: new patch
> ---
>  block/qcow2.c              |   7 +++
>  tests/qemu-iotests/154     | 112 ++++++++++++++++++++++++++++++++++++++++++++-
>  tests/qemu-iotests/154.out |  83 +++++++++++++++++++++++++++++++++
>  3 files changed, 200 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 5c1573c..8038793 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
>      BlockDriverState *file;
>      int64_t res;
> 
> +    if (start + count > bs->total_sectors) {
> +        count = bs->total_sectors - start;
> +    }
> +
>      if (!count) {
>          return true;
>      }
> @@ -2467,6 +2471,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
>      uint32_t tail = (offset + count) % s->cluster_size;
> 
>      trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count);
> +    if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
> +        tail = 0;
> +    }
> 
>      if (head || tail) {
>          int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
> diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
> index 7ca7219..005f09f 100755
> --- a/tests/qemu-iotests/154
> +++ b/tests/qemu-iotests/154
> @@ -2,7 +2,7 @@
>  #
>  # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 034)
>  #
> -# Copyright (C) 2016 Red Hat, Inc.
> +# Copyright (C) 2016-2017 Red Hat, Inc.
>  #
>  # This program is free software; you can redistribute it and/or modify
>  # it under the terms of the GNU General Public License as published by
> @@ -42,7 +42,10 @@ _supported_proto file
>  _supported_os Linux
> 
>  CLUSTER_SIZE=4k
> -size=128M
> +size=$((128 * 1024 * 1024))
> +
> +# This test requires zero clusters, added in v3 images
> +_unsupported_imgopts compat=0.10
> 
>  echo
>  echo == backing file contains zeros ==
> @@ -299,6 +302,111 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | _filter_qemu_io
> 
>  $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> 
> +echo
> +echo == unaligned image tail cluster, no allocation needed ==
> +
> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))

Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
actually think that would be a better test because as it is, the image's
"unaligned" tail is exactly one cluster (so it isn't really unaligned).

> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# With no backing file, write to all or part of unallocated partial cluster
> +
> +# Backing file: 128m: ... | 00 --

Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
so this is just a single cluster; which will be converted from
unallocated to a zero cluster because the tail reads as zero.

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | -- 00

Same here, but now it's the head that reads as zero (and it's already a
zero cluster so nothing changes here).

> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

But I suppose you mean $((size + 512))?

> +
> +# Backing file: 128m: ... | 00 00

And finally this doesn't change anything either -- but then again this
is a fully aligned request, so I don't quite see the point in testing
this here.

> +$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# No offset should be allocated, although the cluster itself is considered
> +# allocated due to the zero flag
> +$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
> +
> +# With backing file that reads as zero, write to all or part of entire cluster
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 00 00
> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | 00 00 -- --

How so? Shouldn't this just stay a zero cluster because the rest of the
cluster already reads as zero?

> +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | -- -- 00 00

Same here.

> +$QEMU_IO -c "write -z $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | 00 00
> +# Active layer: 128m: ... | -- 00 00 --

Same here.

Also, I'm not quite sure what you are testing by just issuing three
consecutive write requests without checking the result each time. You
assume it to be something which you wrote in the comments above each,
but you can't be sure (yes, if something goes wrong in between, the
following final map should catch it, but it's not for certain).

> +$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io
> +
> +# No offset should be allocated, although the cluster itself is considered
> +# allocated due to the zero flag
> +$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +# Preallocated cluster is not freed when unmap is not requested
> +
> +# Backing file: 128m: ... | XX -- => ... | XX 00
> +$QEMU_IO -c "write $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
> +
> +# Backing file: 128m: ... | XX 00 => ... | 00 00
> +$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map |
> +    sed 's/"offset": [0-9]*/"offset": OFFSET/g'
> +
> +echo
> +echo == unaligned image tail cluster, allocation required ==
> +
> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# Write beyond backing file must COW
> +# Backing file: 128m: ... | XX --

Here, the "--" is an unallocated cluster...

> +# Active layer: 128m: ... | -- -- 00 --

...while here it is normally allocated data. Or is "--" supposed to mean
you just don't write to that area...?

> +
> +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# Writes at boundaries of (partial) cluster must not lose mid-cluster data
> +# Backing file: 128m: ... | -- XX
> +# Active layer: 128m: ... | 00 -- -- 00
> +
> +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io

[1]

> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
> +
> +# Partial write must not lose data
> +# Backing file: 128m: ... | -- --
> +# Active layer: 128m: ... | -- -- XX 00

Well, you have basically tested this in [1] already. After the first
write -z to the active layer, the final four sectors are fully allocated
(as they are in a single cluster) and look like this:

00 01 00 00 = XX XX XX XX

(Because the 00s are just written as data.)

Now the write -z in [1] just overwrites the last of those four sectors
with zero data (albeit it being zero already).

But if you want to test it again, I'm not stopping you. :-)

Max

> +
> +$QEMU_IO -c "write -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "read -P 0 $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
> +
>  # success, all done
>  echo "*** done"
>  rm -f $seq.full
> diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
> index da9eabd..f8a09af 100644
> --- a/tests/qemu-iotests/154.out
> +++ b/tests/qemu-iotests/154.out
> @@ -282,4 +282,87 @@ read 1024/1024 bytes at offset 76800
>  { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
>  { "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
>  { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}]
> +
> +== unaligned image tail cluster, no allocation needed ==
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +1024/1024 bytes allocated at offset 128 MiB
> +[{ "start": 0, "length": 134218752, "depth": 0, "zero": true, "data": false}]
> +wrote 2048/2048 bytes at offset 134217728
> +2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
> +wrote 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134218752
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134218240
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +2048/2048 bytes allocated at offset 128 MiB
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 0, "zero": true, "data": false},
> +{ "start": 134217728, "length": 1024, "depth": 0, "zero": true, "data": false, "offset": OFFSET}]
> +
> +== unaligned image tail cluster, allocation required ==
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134218752
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1536/1536 bytes at offset 134218240
> +1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1024/1024 bytes at offset 134218752
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134219264
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134217728
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134218240
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1024/1024 bytes at offset 134218752
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
> +Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
> +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
> +wrote 512/512 bytes at offset 134218752
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +wrote 512/512 bytes at offset 134219264
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 1024/1024 bytes at offset 134217728
> +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134218752
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +read 512/512 bytes at offset 134219264
> +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
> +{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
>  *** done
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-28 20:36         ` Eric Blake
@ 2017-04-28 20:52           ` Max Reitz
  0 siblings, 0 replies; 40+ messages in thread
From: Max Reitz @ 2017-04-28 20:52 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 3082 bytes --]

On 28.04.2017 22:36, Eric Blake wrote:
> On 04/28/2017 03:09 PM, Max Reitz wrote:
>> On 28.04.2017 21:59, Eric Blake wrote:
>>> On 04/28/2017 02:46 PM, Max Reitz wrote:
>>>> On 27.04.2017 03:46, Eric Blake wrote:
>>>>> For the 'alloc' command, accepting an offset in bytes but a length
>>>>> in sectors, and reporting output in sectors, is confusing.  Do
>>>>> everything in bytes, and adjust the expected output accordingly.
>>>>>
>>>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>>>>
>>>
>>>>>      }
>>>>> +    if (!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE)) {
>>>>> +        printf("bytes %" PRId64 " is not sector aligned\n",
>>>>
>>>> This isn't real English. :-)
>>>
>>> But, it's just copy-and-paste from the other instances you just reviewed
>>> in 6/17!  [Translation - if I change this one, I also get to redo that one]
>>
>> No, those are "offset" and "count" -- all singular. "bytes" is plural. ;-)
> 
> Then an obvious solution: s/bytes/count/ in the parameter name :)
> 
> But I still get to redo those, to add the '-' in 'sector-aligned'.

Oh, right! Didn't even notice. Well, in real languages stuff like that
would have to be joined into a single word anyway.

>>> Which of these various alternatives (if any) looks better:
>>>
>>> bytes=511 is not sector-aligned
>>> 511 is not a sector-aligned value for 'bytes'
>>> requested 'bytes' of 511 is not sector-aligned
>>> alignment error: 511 bytes is not sector-aligned
>>> 'bytes' must be sector-aligned: 511
>>> your clever entry here...
>>
>> How about "byte count" instead of "bytes" or "bytes value", if really
>> want to have the exact spelling in there?
>>
>> For your entries above: (1) and (2) work for me (I like (2) a bit
>> better), (3) doesn't sound like real English either, and it should be
>> s/is/are/ in (4) (but it still sounds off with that change). (5) I
>> mostly dislike because I dislike error message of the form "This should
>> be X: $foo", I like "$foo is not X" better.
> 
> Maybe this variation of (3) solves the singular/plural disconnect:
> 
> request of 511 for 'bytes' is not sector-aligned
> 
> which makes it obvious that the "request of 511" (singular) and not the
> parameter name (whether singular 'count' or plural 'bytes') is the
> subject.  But it's a bit wordier than (2).  So it looks like (2) may be
> a winner in all the situations.  But I also think you convinced me to
> rename the command parameter; in my next spin, the help text will read:
> 
> alloc offset [count] -- checks if offset is allocated in the file
> 
> which starts to be non-trivial enough to drop R-b that you were willing
> to give for just an error message wording change.

Well, reviewing the change will be simple enough, so it doesn't really
matter to me. :-)

(I'm still a bit upset why you think that the average qemu-io user
cannot make the connection between "byte count" and a parameter named
"bytes". Because I am the average qemu-io user. Huff!)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster
  2017-04-28 20:48   ` Max Reitz
@ 2017-04-28 21:24     ` Eric Blake
  2017-05-04  2:47       ` Eric Blake
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Blake @ 2017-04-28 21:24 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 9121 bytes --]

On 04/28/2017 03:48 PM, Max Reitz wrote:
> On 27.04.2017 03:46, Eric Blake wrote:
>> We've already improved discards to operate efficiently on the tail
>> of an unaligned qcow2 image; it's time to make a similar improvement
>> to write zeroes.  The special case is only valid at the tail
>> cluster of a file, where we must recognize that any sectors beyond
>> the image end would implicitly read as zero, and therefore should
>> not penalize our logic for widening a partial cluster into writing
>> the whole cluster as zero.
>>
>> Update test 154 to cover the new scenarios, using two images of
>> intentionally differing length.
>>
>> While at it, fix the test to gracefully skip when run as
>> ./check -qcow2 -o compat=0.10 154
>> since the older format lacks zero clusters already required earlier
>> in the test.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>

>>
>> +echo
>> +echo == unaligned image tail cluster, no allocation needed ==
>> +
>> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
> 
> Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
> actually think that would be a better test because as it is, the image's
> "unaligned" tail is exactly one cluster (so it isn't really unaligned).

Uhhh - copy-and-paste?  You're right, that 1024 is too small for what
I'm actually doing with it.  :(

> 
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# With no backing file, write to all or part of unallocated partial cluster
>> +
>> +# Backing file: 128m: ... | 00 --
> 
> Not sure what the "00 --" is supposed to mean. The cluster size is 1 kB,
> so this is just a single cluster; which will be converted from
> unallocated to a zero cluster because the tail reads as zero.
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io

I'm doing a write of 512 bytes, so I was trying to show that in the
cluster starting at 128m, we have a write request for one sector to be
written to 00, while the rest of the sector is left uninitialized. The
obvious intent is that we note that the uninitialized portion reads as
zero, so we can optimize the entire cluster (even though it is a partial
cluster) to be a cluster write-zero instead of a sector-based allocating
write.  But since you've pointed out that my setup is flawed, I'll have
to double-check that I'm actually testing what I thought I was (I know I
hit the right breakpoints in gdb, but its the iotest expected output
that needs to show the difference between pre- and post-patched qemu
with regards to the partial-tail-cluster optimizations).

>> +
>> +# Backing file: 128m: ... | -- 00
> 
> Same here, but now it's the head that reads as zero (and it's already a
> zero cluster so nothing changes here).
> 
>> +$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
> 
> But I suppose you mean $((size + 512))?

Umm. Yeah.  (Hey - you're not the only one that writes patches late at
night).

> 
>> +
>> +# Backing file: 128m: ... | 00 00
> 
> And finally this doesn't change anything either -- but then again this
> is a fully aligned request, so I don't quite see the point in testing
> this here.

At one point during my development, I had a bug where a partial write
request at the head of the cluster behaved differently from a partial
write request at the end (one allocated while the other did not).

Maybe it's best if I do a single write, then inspect the map, then reset
the image, rather than doing all three writes in a row and proving that
the net result of the three writes had no allocation.  In v9, when I was
trying to use unallocated clusters as a shortcut on files with no
backing image, this actually worked (after all three writes, the cluster
would still be unallocated); but since Kevin convinced me that v10 has
to set the zero flag on all three writes, I'm back to the point of
needing to clear the zero flag between writes.

Okay, I'll definitely have to fix it.

>> +# With backing file that reads as zero, write to all or part of entire cluster
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 00 00
>> +$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +# Backing file: 128m: ... | 00 00
>> +# Active layer: 128m: ... | 00 00 -- --
> 
> How so? Shouldn't this just stay a zero cluster because the rest of the
> cluster already reads as zero?

Same problem as above.  In v9, I was trying to exploit code that left a
cluster unallocated; but that's not viable, so I'll have to reset the
image between steps.

> 
> Also, I'm not quite sure what you are testing by just issuing three
> consecutive write requests without checking the result each time. You
> assume it to be something which you wrote in the comments above each,
> but you can't be sure (yes, if something goes wrong in between, the
> following final map should catch it, but it's not for certain).

Well, it helped me during development, but you're absolutely right that
we can't be certain in the long-run.  3 separate checks, with resets
between, is indeed the way to go.


>> +echo
>> +echo == unaligned image tail cluster, allocation required ==
>> +
>> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))

Here, my backing cluster size is intentionally small, so that
bdrv_get_block_status() can easily return sane results for the backing
file portions, so only the active layer has a partial cluster. But
making it obvious in the comments can't hurt, given that my earlier
tests were invalidated when my size choice didn't actually match with
the operations I was attempting.

>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# Write beyond backing file must COW
>> +# Backing file: 128m: ... | XX --
> 
> Here, the "--" is an unallocated cluster...
> 
>> +# Active layer: 128m: ... | -- -- 00 --
> 
> ...while here it is normally allocated data. Or is "--" supposed to mean
> you just don't write to that area...?

If we had subclusters, then it would be unallocated data.  But we don't.
 Since I just reset the image, I'm doing a write onto an unallocated
cluster in the current layer; the write request is to a subset of the
cluster, and the code checking whether the rest of the cluster reads as
zeros will see that a COW (and corresponding allocation) is required.

> 
>> +
>> +$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
>> +$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# Writes at boundaries of (partial) cluster must not lose mid-cluster data
>> +# Backing file: 128m: ... | -- XX
>> +# Active layer: 128m: ... | 00 -- -- 00
>> +
>> +$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
>> +$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_> +$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
> 
> [1]
> 
>> +$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>> +
>> +CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
>> +_make_test_img -b "$TEST_IMG.base" $((size + 2048))
>> +
>> +# Partial write must not lose data
>> +# Backing file: 128m: ... | -- --
>> +# Active layer: 128m: ... | -- -- XX 00
> 
> Well, you have basically tested this in [1] already. After the first
> write -z to the active layer, the final four sectors are fully allocated
> (as they are in a single cluster) and look like this:
> 
> 00 01 00 00 = XX XX XX XX
> 
> (Because the 00s are just written as data.)
> 
> Now the write -z in [1] just overwrites the last of those four sectors
> with zero data (albeit it being zero already).
> 
> But if you want to test it again, I'm not stopping you. :-)
> 

Well, you've already proven that I need to revisit this test with a
fine-tooth comb, especially if I split the series to do the blkdebug
stuff separately from the qcow2 stuff.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 10/17] qcow2: Assert that cluster operations are aligned
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 10/17] qcow2: Assert that cluster operations are aligned Eric Blake
@ 2017-05-03 17:56   ` Max Reitz
  0 siblings, 0 replies; 40+ messages in thread
From: Max Reitz @ 2017-05-03 17:56 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
> is only passed cluster-aligned start values; but we can further
> tighten the assertion that the only unaligned end value is at EOF.
> 
> Recent commits have taken advantage of an unaligned tail cluster,
> for both discard and write zeroes.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: rebase to context
> v9: rebase to master, by asserting that only tail cluster is unaligned
> v7, v8: only earlier half of series submitted for 2.9
> v6: avoid assertion on non-cluster-aligned image, use s->cluster_sectors
> to avoid a shift, drop R-b
> v5: no change
> v4: new patch
> ---
>  block/qcow2-cluster.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count
  2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count Eric Blake
@ 2017-05-03 18:28   ` Max Reitz
  0 siblings, 0 replies; 40+ messages in thread
From: Max Reitz @ 2017-05-03 18:28 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 5790 bytes --]

On 27.04.2017 03:46, Eric Blake wrote:
> Passing a byte offset, but sector count, when we ultimately
> want to operate on cluster granularity, is madness.  Clean up
> the external interfaces to take both offset and count as bytes,
> while still keeping the assertion added previously that the
> caller must align the values to a cluster.  Then rename things
> to make sure backports don't get confused by changed units:
> instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
> we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
> 
> The internal functions still operate on clusters at a time, and
> return an int for number of cleared clusters; but on an image
> with 2M clusters, a single L2 table holds 256k entries that each
> represent a 2M cluster, totalling well over INT_MAX bytes if we
> ever had a request for that many bytes at once.  All our callers
> currently limit themselves to 32-bit bytes (and therefore fewer
> clusters), but by making this function 64-bit clean, we have one
> less place to clean up if we later improve the block layer to
> support 64-bit bytes through all operations (with the block layer
> auto-fragmenting on behalf of more-limited drivers), rather than
> the current state where some interfaces are artificially limited
> to INT_MAX at a time.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> v10: squash in fixup accounting for unaligned file end
> v9: rebase to earlier changes, drop R-b
> v7, v8: only earlier half of series submitted for 2.9
> v6: rebase due to context
> v5: s/count/byte/ to make the units obvious, and rework the math
> to ensure no 32-bit integer overflow on large clusters
> v4: improve function names, split assertion additions into earlier patch
> [no v3 or v2]
> v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
> ---
>  block/qcow2.h          |  9 +++++----
>  block/qcow2-cluster.c  | 40 +++++++++++++++++++++-------------------
>  block/qcow2-snapshot.c |  7 +++----
>  block/qcow2.c          | 21 +++++++++------------
>  4 files changed, 38 insertions(+), 39 deletions(-)

[...]

> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 4f641a9..a47aadc 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1562,16 +1562,16 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>      return nb_clusters;
>  }
> 
> -int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> -    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
> +int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
> +                          uint64_t bytes, enum qcow2_discard_type type,
> +                          bool full_discard)
>  {
>      BDRVQcow2State *s = bs->opaque;
> -    uint64_t end_offset;
> +    uint64_t end_offset = offset + bytes;
>      uint64_t nb_clusters;
> +    int64_t cleared;
>      int ret;
> 
> -    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
> -
>      /* Caller must pass aligned values, except at image end */
>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||

Applying an s/end_offset - offset/bytes/ to the
nb_clusters = size_to_clusters(s, end_offset - offset) following this
would make sense (but won't functionally change anything).

> @@ -1583,13 +1583,15 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
> 
>      /* Each L2 table is handled by its own loop iteration */
>      while (nb_clusters > 0) {
> -        ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
> -        if (ret < 0) {
> +        cleared = discard_single_l2(bs, offset, nb_clusters, type,
> +                                    full_discard);
> +        if (cleared < 0) {
> +            ret = cleared;
>              goto fail;
>          }
> 
> -        nb_clusters -= ret;
> -        offset += (ret * s->cluster_size);
> +        nb_clusters -= cleared;
> +        offset += (cleared * s->cluster_size);
>      }
> 
>      ret = 0;

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index 8038793..4d34610 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2858,18 +2857,16 @@ static int qcow2_make_empty(BlockDriverState *bs)
> 
>      /* This fallback code simply discards every active cluster; this is slow,
>       * but works in all cases */
> -    for (start_sector = 0; start_sector < bs->total_sectors;
> -         start_sector += sector_step)
> +    end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
> +    for (offset = 0; offset < end_offset; offset += step)
>      {

This opening brace should now be on the previous line.

With at least this fixed (I leave the other thing to your discretion):

Reviewed-by: Max Reitz <mreitz@redhat.com>

>          /* As this function is generally used after committing an external
>           * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
>           * default action for this kind of discard is to pass the discard,
>           * which will ideally result in an actually smaller image file, as
>           * is probably desired. */
> -        ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
> -                                     MIN(sector_step,
> -                                         bs->total_sectors - start_sector),
> -                                     QCOW2_DISCARD_SNAPSHOT, true);
> +        ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
> +                                    QCOW2_DISCARD_SNAPSHOT, true);
>          if (ret < 0) {
>              break;
>          }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster
  2017-04-28 21:24     ` Eric Blake
@ 2017-05-04  2:47       ` Eric Blake
  0 siblings, 0 replies; 40+ messages in thread
From: Eric Blake @ 2017-05-04  2:47 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: kwolf, qemu-block

[-- Attachment #1: Type: text/plain, Size: 1247 bytes --]

On 04/28/2017 04:24 PM, Eric Blake wrote:

>>> +echo
>>> +echo == unaligned image tail cluster, no allocation needed ==
>>> +
>>> +CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
>>
>> Any reason for the CLUSTER_SIZE? It passes with 64 kB as well, and I
>> actually think that would be a better test because as it is, the image's
>> "unaligned" tail is exactly one cluster (so it isn't really unaligned).
> 
> Uhhh - copy-and-paste?  You're right, that 1024 is too small for what
> I'm actually doing with it.  :(

Actually, the whole test defaults to 4k clusters except when overridden,
but I did learn while hammering at things that we don't have a nice way
to tell that a backing file slightly shorter than the active file
behaves as though we read all zeros for the difference. I'm thinking of
proposing an RFC patch introducing BDRV_BLOCK_EOF, which gets set any
time bdrv_get_block_status() clamps the returns *pnum because it hit
EOF, to see what optimizations fall out elsewhere in the tree as a
result.  (It may not be worth it in the end, but we'll see).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-05-04  2:47 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-27  1:46 [Qemu-devel] [PATCH v10 00/17] add blkdebug tests Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 01/17] block: Update comments on BDRV_BLOCK_* meanings Eric Blake
2017-04-28 17:12   ` Max Reitz
2017-04-28 20:18   ` Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 02/17] qcow2: Correctly report status of preallocated zero clusters Eric Blake
2017-04-28 17:35   ` Max Reitz
2017-04-28 19:04     ` Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 03/17] qcow2: Reuse " Eric Blake
2017-04-28 17:51   ` Max Reitz
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 04/17] qcow2: Optimize zero_single_l2() to minimize L2 churn Eric Blake
2017-04-28 18:00   ` Max Reitz
2017-04-28 19:11     ` Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 05/17] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
2017-04-28 19:28   ` Max Reitz
2017-04-28 19:48     ` Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 06/17] qemu-io: Don't open-code QEMU_IS_ALIGNED Eric Blake
2017-04-28 19:30   ` Max Reitz
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 07/17] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
2017-04-28 19:46   ` Max Reitz
2017-04-28 19:59     ` Eric Blake
2017-04-28 20:09       ` Max Reitz
2017-04-28 20:36         ` Eric Blake
2017-04-28 20:52           ` Max Reitz
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 08/17] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
2017-04-28 19:53   ` Max Reitz
2017-04-28 20:03     ` Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 09/17] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
2017-04-28 20:48   ` Max Reitz
2017-04-28 21:24     ` Eric Blake
2017-05-04  2:47       ` Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 10/17] qcow2: Assert that cluster operations are aligned Eric Blake
2017-05-03 17:56   ` Max Reitz
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 11/17] qcow2: Discard/zero clusters by byte count Eric Blake
2017-05-03 18:28   ` Max Reitz
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 12/17] blkdebug: Sanity check block layer guarantees Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 13/17] blkdebug: Refactor error injection Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 14/17] blkdebug: Add pass-through write_zero and discard support Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 15/17] blkdebug: Simplify override logic Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 16/17] blkdebug: Add ability to override unmap geometries Eric Blake
2017-04-27  1:46 ` [Qemu-devel] [PATCH v10 17/17] tests: Add coverage for recent block geometry fixes Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.