All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests
@ 2017-04-11  1:17 Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file Eric Blake
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v9

Prerequisite: Max's block-next tree:
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg01298.html

v6 was:
https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg01562.html
v7 and v8 were the first half of v6, and already applied to 2.9,
at commit f82c5b17.

Since then:
- Rebase to master
- Add more qcow2 patches to make write zeroes at unaligned tail cluster
more like discard of the same cluster, including more tests

001/13:[down] 'qcow2: Unallocate unmapped zero clusters if no backing file'
002/13:[down] 'iotests: Add test 179 to cover write zeroes with unmap'
003/13:[down] 'qemu-io: Switch 'alloc' command to byte-based length'
004/13:[down] 'qemu-io: Switch 'map' output to byte-based reporting'
005/13:[down] 'qcow2: Optimize write zero of unaligned tail cluster'
006/13:[0018] [FC] 'qcow2: Assert that cluster operations are aligned'
007/13:[0003] [FC] 'qcow2: Discard/zero clusters by byte count'
008/13:[----] [--] 'blkdebug: Sanity check block layer guarantees'
009/13:[----] [--] 'blkdebug: Refactor error injection'
010/13:[----] [--] 'blkdebug: Add pass-through write_zero and discard support'
011/13:[----] [--] 'blkdebug: Simplify override logic'
012/13:[0036] [FC] 'blkdebug: Add ability to override unmap geometries'
013/13:[----] [-C] 'tests: Add coverage for recent block geometry fixes'

Eric Blake (13):
  qcow2: Unallocate unmapped zero clusters if no backing file
  iotests: Add test 179 to cover write zeroes with unmap
  qemu-io: Switch 'alloc' command to byte-based length
  qemu-io: Switch 'map' output to byte-based reporting
  qcow2: Optimize write zero of unaligned tail cluster
  qcow2: Assert that cluster operations are aligned
  qcow2: Discard/zero clusters by byte count
  blkdebug: Sanity check block layer guarantees
  blkdebug: Refactor error injection
  blkdebug: Add pass-through write_zero and discard support
  blkdebug: Simplify override logic
  blkdebug: Add ability to override unmap geometries
  tests: Add coverage for recent block geometry fixes

 qapi/block-core.json              |  33 ++++-
 block/qcow2.h                     |   9 +-
 block/blkdebug.c                  | 264 +++++++++++++++++++++++++++++++-------
 block/qcow2-cluster.c             |  62 +++++----
 block/qcow2-snapshot.c            |   7 +-
 block/qcow2.c                     |  28 ++--
 qemu-io-cmds.c                    |  35 ++---
 tests/qemu-iotests/019.out        |   8 +-
 tests/qemu-iotests/102.out        |   4 +-
 tests/qemu-iotests/146.out        |  30 ++---
 tests/qemu-iotests/154            | 111 +++++++++++++++-
 tests/qemu-iotests/154.out        |  82 ++++++++++++
 tests/qemu-iotests/177            | 114 ++++++++++++++++
 tests/qemu-iotests/177.out        |  49 +++++++
 tests/qemu-iotests/179            |  78 +++++++++++
 tests/qemu-iotests/179.out        |  22 ++++
 tests/qemu-iotests/common.pattern |   2 +-
 tests/qemu-iotests/group          |   2 +
 18 files changed, 809 insertions(+), 131 deletions(-)
 create mode 100755 tests/qemu-iotests/177
 create mode 100644 tests/qemu-iotests/177.out
 create mode 100755 tests/qemu-iotests/179
 create mode 100644 tests/qemu-iotests/179.out

-- 
2.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-12  9:49   ` Kevin Wolf
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 02/13] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

'qemu-img map' already coalesces information about unallocated
clusters (those with status 0) and pure zero clusters (those
with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
qcow2 images with no backing file already report all unallocated
clusters (in the preallocation sense of clusters with no offset)
as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
to external users), thanks to generic block layer code in
bdrv_co_get_block_status().

So, for an image with no backing file, having bdrv_pwrite_zeroes
mark clusters as unallocated (defer to backing file) rather than
reads-as-zero (regardless of backing file) makes no difference
to normal behavior, but may potentially allow for fewer writes to
the L2 table of a freshly-created image where the L2 table is
initially written to all-zeroes (although I don't actually know
if we skip an L2 update and flush when re-writing the same
contents as already present).

Furthermore, this matches the behavior of discard_single_l2(), in
favoring an unallocated cluster over a zero cluster when full
discard is requested.

Meanwhile, version 2 qcow2 files (compat=0.10) lack support for an
explicit zero cluster.  This minor tweak therefore allows us to turn
write zeroes with unmap into an actual unallocation on those files,
where they used to return -ENOTSUP and cause an allocation due to
the fallback to explicitly written zeroes.

Note that technically, we _could_ write a cluster as unallocated
rather than zero if a backing file exists but the backing file
also reads as zero, but that's more expensive to determine, so
this optimization is limited to qcow2 without a backing file.

Also note that this patch unmaps a compressed cluster when there
is no backing file, even when BDRV_REQ_MAY_UNMAP was not set, but
it is unlikely to have compressed clusters in a fully preallocated
image (the point of compression is to reduce space requirements),
so the side effect of unmapping a cluster in that case is not
deemed to be a problem.

Finally, note that this change can create a subtle difference if a
backing file is later added with 'qemu-img rebase -u'; pre-patch,
a v3 file (compat=1.1) will have a cluster that still reads as 0
(the cluster is not allocated in the sense of preallocation, but
is provided from the layer), while post-patch the cluster will
now defer to the backing file - but it's already an unsupported
action if you are modifying guest-visible data by messing with
backing chains ;)

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v9: new patch
---
 block/qcow2-cluster.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 100398c..12f44b2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1579,7 +1579,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
         /* Update L2 entries */
         qcow2_cache_entry_mark_dirty(bs, s->l2_table_cache, l2_table);
         if (old_offset & QCOW_OFLAG_COMPRESSED || flags & BDRV_REQ_MAY_UNMAP) {
-            l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
+            l2_table[l2_index + i] = bs->backing
+                ? cpu_to_be64(QCOW_OFLAG_ZERO) : 0;
             qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST);
         } else {
             l2_table[l2_index + i] |= cpu_to_be64(QCOW_OFLAG_ZERO);
@@ -1598,8 +1599,11 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
     uint64_t nb_clusters;
     int ret;

-    /* The zero flag is only supported by version 3 and newer */
-    if (s->qcow_version < 3) {
+    /* The zero flag is only supported by version 3 and newer; we
+     * require the use of that flag if there is a backing file or if
+     * we are not allowed to unmap.  */
+    if (s->qcow_version < 3 &&
+        (bs->backing || !(flags & BDRV_REQ_MAY_UNMAP))) {
         return -ENOTSUP;
     }

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 02/13] iotests: Add test 179 to cover write zeroes with unmap
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

No tests were covering write zeroes with unmap.  Additionally,
I wanted to prove that my patch to optimize write zeroes for
compat=0.10 images actually had an impact; for that, run:
./check -qcow2 -o compat=0.10 179

Writing the test to work correctly for both old and new qcow2
images is a bit tricky: 'qemu-img map' works for showing
whether a cluster is assigned an offset (in the preallocation
sense), but older images have to write literal zeroes where
newer images can set the zero flag.  Thankfully, 'qemu-io alloc'
hides the difference, by instead reporting whether a cluster's
content comes from the current layer (regardless of whether it
was due to a cluster allocation or a flag).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/179     | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/179.out | 22 +++++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 101 insertions(+)
 create mode 100755 tests/qemu-iotests/179
 create mode 100644 tests/qemu-iotests/179.out

diff --git a/tests/qemu-iotests/179 b/tests/qemu-iotests/179
new file mode 100755
index 0000000..338a45d
--- /dev/null
+++ b/tests/qemu-iotests/179
@@ -0,0 +1,78 @@
+#!/bin/bash
+#
+# Test case for write zeroes with unmap
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=eblake@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo '=== Testing write zeroes with unmap ==='
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 16M
+_make_test_img -b "$TEST_IMG.base"
+
+# Aligned writes should not allocate clusters, if unmap is requested
+# and there is no backing file.
+$QEMU_IO -c "write -z -u 1M 1M" "$TEST_IMG.base" | _filter_qemu_io
+
+# Unmap can even clear previously-allocated clusters.
+$QEMU_IO -c "write 3M 1M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z -u 3M 1M" "$TEST_IMG.base" | _filter_qemu_io
+
+# Up to now, the entire image should still be unallocated.
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# But not requesting unmap must result in allocation (whether a cluster
+# allocation in compat=0.10 or a flag allocation in compat=1.1).
+$QEMU_IO -c "write -z 5M 1M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "alloc 5M $((1024*1024 / 512))" "$TEST_IMG.base" | _filter_qemu_io
+
+# Presence of a backing file overrides permission to unmap.  Again,
+# compat=0.10 images allocate, while compat=1.1 images set zero flag.
+$QEMU_IO -c "write -z -u 7M 1M" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc 7M $((1024 * 1024 / 512))" "$TEST_IMG" | _filter_qemu_io
+
+# Final check that images are still sane.
+TEST_IMG="$TEST_IMG.base" _check_test_img
+_check_test_img
+
+# success, all done
+echo '*** done'
+status=0
diff --git a/tests/qemu-iotests/179.out b/tests/qemu-iotests/179.out
new file mode 100644
index 0000000..fc97b19
--- /dev/null
+++ b/tests/qemu-iotests/179.out
@@ -0,0 +1,22 @@
+QA output created by 179
+
+=== Testing write zeroes with unmap ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=16777216
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=16777216 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 3145728
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 3145728
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 16777216, "depth": 1, "zero": true, "data": false}]
+wrote 1048576/1048576 bytes at offset 5242880
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 sectors allocated at offset 5 MiB
+wrote 1048576/1048576 bytes at offset 7340032
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 sectors allocated at offset 7 MiB
+No errors were found on the image.
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 43142dd..cfc8823 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -169,3 +169,4 @@
 174 auto
 175 auto quick
 176 rw auto backing
+179 rw auto quick
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 02/13] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  2:37   ` Philippe Mathieu-Daudé
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 04/13] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

For the 'alloc' command, accepting an offset in bytes but a length
in sectors, and reporting output in sectors, is confusing.  Do
everything in bytes, and adjust the expected output accordingly.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io-cmds.c                    | 30 ++++++++++++++++++------------
 tests/qemu-iotests/019.out        |  8 ++++----
 tests/qemu-iotests/179            |  4 ++--
 tests/qemu-iotests/179.out        |  4 ++--
 tests/qemu-iotests/common.pattern |  2 +-
 5 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 9e023a4..df7297f 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1760,7 +1760,7 @@ out:
 static int alloc_f(BlockBackend *blk, int argc, char **argv)
 {
     BlockDriverState *bs = blk_bs(blk);
-    int64_t offset, sector_num, nb_sectors, remaining;
+    int64_t offset, sector_num, nb_sectors, remaining, bytes;
     char s1[64];
     int num, ret;
     int64_t sum_alloc;
@@ -1776,18 +1776,24 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
     }

     if (argc == 3) {
-        nb_sectors = cvtnum(argv[2]);
-        if (nb_sectors < 0) {
-            print_cvtnum_err(nb_sectors, argv[2]);
+        bytes = cvtnum(argv[2]);
+        if (bytes < 0) {
+            print_cvtnum_err(bytes, argv[2]);
             return 0;
-        } else if (nb_sectors > INT_MAX) {
-            printf("length argument cannot exceed %d, given %s\n",
-                   INT_MAX, argv[2]);
+        } else if (bytes > INT_MAX * BDRV_SECTOR_SIZE) {
+            printf("length argument cannot exceed %llu, given %s\n",
+                   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
             return 0;
         }
     } else {
-        nb_sectors = 1;
+        bytes = BDRV_SECTOR_SIZE;
     }
+    if (bytes & 0x1ff) {
+        printf("bytes %" PRId64 " is not sector aligned\n",
+               bytes);
+        return 0;
+    }
+    nb_sectors = bytes >> BDRV_SECTOR_BITS;

     remaining = nb_sectors;
     sum_alloc = 0;
@@ -1811,8 +1817,8 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)

     cvtstr(offset, s1, sizeof(s1));

-    printf("%"PRId64"/%"PRId64" sectors allocated at offset %s\n",
-           sum_alloc, nb_sectors, s1);
+    printf("%"PRId64"/%"PRId64" bytes allocated at offset %s\n",
+           sum_alloc << BDRV_SECTOR_BITS, nb_sectors << BDRV_SECTOR_BITS, s1);
     return 0;
 }

@@ -1822,8 +1828,8 @@ static const cmdinfo_t alloc_cmd = {
     .argmin     = 1,
     .argmax     = 2,
     .cfunc      = alloc_f,
-    .args       = "off [sectors]",
-    .oneline    = "checks if a sector is present in the file",
+    .args       = "offset [bytes]",
+    .oneline    = "checks if offset is allocated in the file",
 };


diff --git a/tests/qemu-iotests/019.out b/tests/qemu-iotests/019.out
index 0124264..17a7c03 100644
--- a/tests/qemu-iotests/019.out
+++ b/tests/qemu-iotests/019.out
@@ -542,8 +542,8 @@ Testing conversion with -B TEST_DIR/t.IMGFMT.base

 Checking if backing clusters are allocated when they shouldn't

-0/128 sectors allocated at offset 1 MiB
-0/128 sectors allocated at offset 4.001 GiB
+0/65536 bytes allocated at offset 1 MiB
+0/65536 bytes allocated at offset 4.001 GiB
 Reading

 === IO: pattern 42
@@ -1086,8 +1086,8 @@ Testing conversion with -o backing_file=TEST_DIR/t.IMGFMT.base

 Checking if backing clusters are allocated when they shouldn't

-0/128 sectors allocated at offset 1 MiB
-0/128 sectors allocated at offset 4.001 GiB
+0/65536 bytes allocated at offset 1 MiB
+0/65536 bytes allocated at offset 4.001 GiB
 Reading

 === IO: pattern 42
diff --git a/tests/qemu-iotests/179 b/tests/qemu-iotests/179
index 338a45d..44541e1 100755
--- a/tests/qemu-iotests/179
+++ b/tests/qemu-iotests/179
@@ -62,12 +62,12 @@ $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
 # But not requesting unmap must result in allocation (whether a cluster
 # allocation in compat=0.10 or a flag allocation in compat=1.1).
 $QEMU_IO -c "write -z 5M 1M" "$TEST_IMG.base" | _filter_qemu_io
-$QEMU_IO -c "alloc 5M $((1024*1024 / 512))" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "alloc 5M 1M" "$TEST_IMG.base" | _filter_qemu_io

 # Presence of a backing file overrides permission to unmap.  Again,
 # compat=0.10 images allocate, while compat=1.1 images set zero flag.
 $QEMU_IO -c "write -z -u 7M 1M" "$TEST_IMG" | _filter_qemu_io
-$QEMU_IO -c "alloc 7M $((1024 * 1024 / 512))" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "alloc 7M 1M" "$TEST_IMG" | _filter_qemu_io

 # Final check that images are still sane.
 TEST_IMG="$TEST_IMG.base" _check_test_img
diff --git a/tests/qemu-iotests/179.out b/tests/qemu-iotests/179.out
index fc97b19..18ecf0f 100644
--- a/tests/qemu-iotests/179.out
+++ b/tests/qemu-iotests/179.out
@@ -13,10 +13,10 @@ wrote 1048576/1048576 bytes at offset 3145728
 [{ "start": 0, "length": 16777216, "depth": 1, "zero": true, "data": false}]
 wrote 1048576/1048576 bytes at offset 5242880
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-2048/2048 sectors allocated at offset 5 MiB
+1048576/1048576 bytes allocated at offset 5 MiB
 wrote 1048576/1048576 bytes at offset 7340032
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-2048/2048 sectors allocated at offset 7 MiB
+1048576/1048576 bytes allocated at offset 7 MiB
 No errors were found on the image.
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/common.pattern b/tests/qemu-iotests/common.pattern
index ddfbca1..34f4a8d 100644
--- a/tests/qemu-iotests/common.pattern
+++ b/tests/qemu-iotests/common.pattern
@@ -18,7 +18,7 @@

 function do_is_allocated() {
     local start=$1
-    local size=$(( $2 / 512))
+    local size=$2
     local step=$3
     local count=$4

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 04/13] qemu-io: Switch 'map' output to byte-based reporting
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (2 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 05/13] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Mixing byte offset and sector allocation counts is a bit
confusing.  Also, reporting n/m sectors, where m decreases
according to the remaining size of the file, isn't really
adding any useful information.  Update the output to use
byte counts, and adjust the affected tests (./check -qcow2 102,
./check -vpc 146).

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qemu-io-cmds.c             |  5 ++---
 tests/qemu-iotests/102.out |  4 ++--
 tests/qemu-iotests/146.out | 30 +++++++++++++++---------------
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index df7297f..ed79c30 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1895,9 +1895,8 @@ static int map_f(BlockBackend *blk, int argc, char **argv)

         retstr = ret ? "    allocated" : "not allocated";
         cvtstr(offset << 9ULL, s1, sizeof(s1));
-        printf("[% 24" PRId64 "] % 8" PRId64 "/% 8" PRId64 " sectors %s "
-               "at offset %s (%d)\n",
-               offset << 9ULL, num, nb_sectors, retstr, s1, ret);
+        printf("[% 24" PRId64 "] % 16" PRId64 " bytes %s at offset %s (%d)\n",
+               offset << 9ULL, num << 9ULL, retstr, s1, ret);

         offset += num;
         nb_sectors -= num;
diff --git a/tests/qemu-iotests/102.out b/tests/qemu-iotests/102.out
index eecde16..0e0ae4c 100644
--- a/tests/qemu-iotests/102.out
+++ b/tests/qemu-iotests/102.out
@@ -6,7 +6,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Image resized.
-[                       0]      128/     128 sectors     allocated at offset 0 bytes (1)
+[                       0]            65536 bytes     allocated at offset 0 bytes (1)
 Offset          Length          Mapped to       File

 === Testing map on an image file truncated outside of qemu ===
@@ -17,5 +17,5 @@ wrote 65536/65536 bytes at offset 0
 Image resized.
 QEMU X.Y.Z monitor - type 'help' for more information
 (qemu) qemu-io drv0 map
-[                       0]      128/     128 sectors     allocated at offset 0 bytes (1)
+[                       0]            65536 bytes     allocated at offset 0 bytes (1)
 *** done
diff --git a/tests/qemu-iotests/146.out b/tests/qemu-iotests/146.out
index 4f334d8..d4e2f62 100644
--- a/tests/qemu-iotests/146.out
+++ b/tests/qemu-iotests/146.out
@@ -2,39 +2,39 @@ QA output created by 146

 === Testing VPC Autodetect ===

-[                       0]  266334240/ 266334240 sectors not allocated at offset 0 bytes (0)
+[                       0]     136363130880 bytes not allocated at offset 0 bytes (0)

 === Testing VPC with current_size force ===

-[                       0]  266338304/ 266338304 sectors not allocated at offset 0 bytes (0)
+[                       0]     136365211648 bytes not allocated at offset 0 bytes (0)

 === Testing VPC with chs force ===

-[                       0]  266334240/ 266334240 sectors not allocated at offset 0 bytes (0)
+[                       0]     136363130880 bytes not allocated at offset 0 bytes (0)

 === Testing Hyper-V Autodetect ===

-[                       0]  266338304/ 266338304 sectors not allocated at offset 0 bytes (0)
+[                       0]     136365211648 bytes not allocated at offset 0 bytes (0)

 === Testing Hyper-V with current_size force ===

-[                       0]  266338304/ 266338304 sectors not allocated at offset 0 bytes (0)
+[                       0]     136365211648 bytes not allocated at offset 0 bytes (0)

 === Testing Hyper-V with chs force ===

-[                       0]  266334240/ 266334240 sectors not allocated at offset 0 bytes (0)
+[                       0]     136363130880 bytes not allocated at offset 0 bytes (0)

 === Testing d2v Autodetect ===

-[                       0]   514560/  514560 sectors     allocated at offset 0 bytes (1)
+[                       0]        263454720 bytes     allocated at offset 0 bytes (1)

 === Testing d2v with current_size force ===

-[                       0]   514560/  514560 sectors     allocated at offset 0 bytes (1)
+[                       0]        263454720 bytes     allocated at offset 0 bytes (1)

 === Testing d2v with chs force ===

-[                       0]   514560/  514560 sectors     allocated at offset 0 bytes (1)
+[                       0]        263454720 bytes     allocated at offset 0 bytes (1)

 === Testing Image create, default ===

@@ -42,15 +42,15 @@ Formatting 'TEST_DIR/IMGFMT-create-test.IMGFMT', fmt=IMGFMT size=4294967296

 === Read created image, default opts ====

-[                       0]  8389584/ 8389584 sectors not allocated at offset 0 bytes (0)
+[                       0]       4295467008 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=chs ====

-[                       0]  8389584/ 8389584 sectors not allocated at offset 0 bytes (0)
+[                       0]       4295467008 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=current_size ====

-[                       0]  8389584/ 8389584 sectors not allocated at offset 0 bytes (0)
+[                       0]       4295467008 bytes not allocated at offset 0 bytes (0)

 === Testing Image create, force_size ===

@@ -58,13 +58,13 @@ Formatting 'TEST_DIR/IMGFMT-create-test.IMGFMT', fmt=IMGFMT size=4294967296 forc

 === Read created image, default opts ====

-[                       0]  8388608/ 8388608 sectors not allocated at offset 0 bytes (0)
+[                       0]       4294967296 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=chs ====

-[                       0]  8388608/ 8388608 sectors not allocated at offset 0 bytes (0)
+[                       0]       4294967296 bytes not allocated at offset 0 bytes (0)

 === Read created image, force_size_calc=current_size ====

-[                       0]  8388608/ 8388608 sectors not allocated at offset 0 bytes (0)
+[                       0]       4294967296 bytes not allocated at offset 0 bytes (0)
 *** done
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 05/13] qcow2: Optimize write zero of unaligned tail cluster
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (3 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 04/13] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 06/13] qcow2: Assert that cluster operations are aligned Eric Blake
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

We've already improved discards to operate efficiently on the tail
of an unaligned qcow2 image; it's time to make a similar improvement
to write zeroes.  The special case is only valid at the tail
cluster of a file, where we must recognize that any sectors beyond
the image end would implicitly read as zero, and therefore should
not penalize our logic for widening a partial cluster into writing
the whole cluster as zero.

Update test 154 to cover the new scenarios, using two images of
intentionally differing length.

While at it, fix the test to gracefully skip when run as
./check -qcow2 -o compat=0.10 154
since the older format lacks zero clusters already required earlier
in the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c              |   7 +++
 tests/qemu-iotests/154     | 111 ++++++++++++++++++++++++++++++++++++++++++++-
 tests/qemu-iotests/154.out |  82 +++++++++++++++++++++++++++++++++
 3 files changed, 198 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5c1573c..8038793 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2449,6 +2449,10 @@ static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
     BlockDriverState *file;
     int64_t res;

+    if (start + count > bs->total_sectors) {
+        count = bs->total_sectors - start;
+    }
+
     if (!count) {
         return true;
     }
@@ -2467,6 +2471,9 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     uint32_t tail = (offset + count) % s->cluster_size;

     trace_qcow2_pwrite_zeroes_start_req(qemu_coroutine_self(), offset, count);
+    if (offset + count == bs->total_sectors * BDRV_SECTOR_SIZE) {
+        tail = 0;
+    }

     if (head || tail) {
         int64_t cl_start = (offset - head) >> BDRV_SECTOR_BITS;
diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
index 7ca7219..66128f5 100755
--- a/tests/qemu-iotests/154
+++ b/tests/qemu-iotests/154
@@ -2,7 +2,7 @@
 #
 # qcow2 specific bdrv_pwrite_zeroes tests with backing files (complements 034)
 #
-# Copyright (C) 2016 Red Hat, Inc.
+# Copyright (C) 2016-2017 Red Hat, Inc.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -42,7 +42,10 @@ _supported_proto file
 _supported_os Linux

 CLUSTER_SIZE=4k
-size=128M
+size=$((128 * 1024 * 1024))
+
+# This test requires zero clusters, added in v3 images
+_unsupported_imgopts compat=0.10

 echo
 echo == backing file contains zeros ==
@@ -299,6 +302,110 @@ $QEMU_IO -c "read -P 0 75k 1k" "$TEST_IMG" | _filter_qemu_io

 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map

+echo
+echo == unaligned image tail cluster, no allocation needed ==
+
+CLUSTER_SIZE=1024 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# With no backing file, write to all or part of unallocated partial cluster
+
+# Backing file: 128m: ... | 00 --
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
+
+# Backing file: 128m: ... | -- 00
+$QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io
+
+# Backing file: 128m: ... | 00 00
+$QEMU_IO -c "write -z $size 1024" "$TEST_IMG.base" | _filter_qemu_io
+
+# No offset should be allocated, although the cluster itself is considered
+# allocated due to the zero flag
+$QEMU_IO -c "alloc $size 1024" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+# With backing file that reads as zero, write to all or part of entire cluster
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | 00 00 00 00
+$QEMU_IO -c "write -z $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | 00 00 -- --
+$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | -- -- 00 00
+$QEMU_IO -c "write -z $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+
+# Backing file: 128m: ... | 00 00
+# Active layer: 128m: ... | -- 00 00 --
+$QEMU_IO -c "write -z $((size + 512)) 1024" "$TEST_IMG" | _filter_qemu_io
+
+# No offset should be allocated, although the cluster itself is considered
+# allocated due to the zero flag
+$QEMU_IO -c "alloc $size 2048" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+# Preallocated cluster is not freed when unmap is not requested
+
+# Backing file: 128m: ... | XX -- => ... | XX 00
+$QEMU_IO -c "write $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
+
+# Backing file: 128m: ... | XX 00 => ... | 00 00
+$QEMU_IO -c "write -z $((size)) 1024" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG.base" | _filter_qemu_img_map
+
+echo
+echo == unaligned image tail cluster, allocation required ==
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# Write beyond backing file must COW
+# Backing file: 128m: ... | XX --
+# Active layer: 128m: ... | -- -- 00 --
+
+$QEMU_IO -c "write -P 1 $((size)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 512)) 1536" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# Writes at boundaries of (partial) cluster must not lose mid-cluster data
+# Backing file: 128m: ... | -- XX
+# Active layer: 128m: ... | 00 -- -- 00
+
+$QEMU_IO -c "write -P 1 $((size + 512)) 512" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 512)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1024)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024))
+_make_test_img -b "$TEST_IMG.base" $((size + 2048))
+
+# Partial write must not lose data
+# Backing file: 128m: ... | -- --
+# Active layer: 128m: ... | -- -- XX 00
+
+$QEMU_IO -c "write -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -z $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size)) 1024" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 1 $((size + 1024)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 $((size + 1536)) 512" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index da9eabd..d6e76f4 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -282,4 +282,86 @@ read 1024/1024 bytes at offset 76800
 { "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 73728, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
 { "start": 77824, "length": 134139904, "depth": 1, "zero": true, "data": false}]
+
+== unaligned image tail cluster, no allocation needed ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+1024/1024 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134218752, "depth": 0, "zero": true, "data": false}]
+wrote 2048/2048 bytes at offset 134217728
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134218240
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+2048/2048 bytes allocated at offset 128 MiB
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134218752, "depth": 0, "zero": true, "data": false}]
+
+== unaligned image tail cluster, allocation required ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134218752
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1536/1536 bytes at offset 134218240
+1.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134217728
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218240
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134218752
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134218752
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 512/512 bytes at offset 134218752
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1024/1024 bytes at offset 134217728
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134218752
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 134219264
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": 20480}]
 *** done
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 06/13] qcow2: Assert that cluster operations are aligned
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (4 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 05/13] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count Eric Blake
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

We already audited (in commit 0c1bd469) that qcow2_discard_clusters()
is only passed cluster-aligned start values; but we can further
tighten the assertion that the only unaligned end value is at EOF.

Recent commits have taken advantage of an unaligned tail cluster,
for both discard and write zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v9: rebase to master, by asserting that only tail cluster is unaligned
v7, v8: only earlier half of series submitted for 2.9
v6: avoid assertion on non-cluster-aligned image, use s->cluster_sectors
to avoid a shift, drop R-b
v5: no change
v4: new patch
---
 block/qcow2-cluster.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 12f44b2..362a855 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1519,11 +1519,10 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);

-    /* The caller must cluster-align start; round end down except at EOF */
+    /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-    if (end_offset != bs->total_sectors * BDRV_SECTOR_SIZE) {
-        end_offset = start_of_cluster(s, end_offset);
-    }
+    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);

     nb_clusters = size_to_clusters(s, end_offset - offset);

@@ -1596,9 +1595,17 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
                         int flags)
 {
     BDRVQcow2State *s = bs->opaque;
+    uint64_t end_offset;
     uint64_t nb_clusters;
     int ret;

+    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
+
+    /* Caller must pass aligned values, except at image end */
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
+           end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+
     /* The zero flag is only supported by version 3 and newer; we
      * require the use of that flag if there is a backing file or if
      * we are not allowed to unmap.  */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (5 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 06/13] qcow2: Assert that cluster operations are aligned Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11 22:12   ` Eric Blake
  2017-04-11 22:15   ` [Qemu-devel] [PATCH v9.5 07/13] fixup! " Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 08/13] blkdebug: Sanity check block layer guarantees Eric Blake
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness.  Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster.  Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().

The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once.  All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v9: rebase to earlier changes, drop R-b
v7, v8: only earlier half of series submitted for 2.9
v6: rebase due to context
v5: s/count/byte/ to make the units obvious, and rework the math
to ensure no 32-bit integer overflow on large clusters
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
 block/qcow2.h          |  9 +++++----
 block/qcow2-cluster.c  | 39 ++++++++++++++++++++++-----------------
 block/qcow2-snapshot.c |  7 +++----
 block/qcow2.c          | 21 +++++++++------------
 4 files changed, 39 insertions(+), 37 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index f8aeb08..808104c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -544,10 +544,11 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                          int compressed_size);

 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags);
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard);
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags);

 int qcow2_expand_zero_clusters(BlockDriverState *bs,
                                BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 362a855..595e6e3 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1509,16 +1509,16 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
-    int nb_sectors, enum qcow2_discard_type type, bool full_discard)
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, enum qcow2_discard_type type,
+                          bool full_discard)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + bytes;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
@@ -1530,13 +1530,15 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,

     /* Each L2 table is handled by its own loop iteration */
     while (nb_clusters > 0) {
-        ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
-        if (ret < 0) {
+        cleared = discard_single_l2(bs, offset, nb_clusters, type,
+                                    full_discard);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
@@ -1591,20 +1593,22 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
     return nb_clusters;
 }

-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
-                        int flags)
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+                          uint64_t bytes, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t end_offset;
     uint64_t nb_clusters;
+    int64_t cleared;
     int ret;

-    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
+    end_offset = offset + bytes;

     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
            end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
+    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));

     /* The zero flag is only supported by version 3 and newer; we
      * require the use of that flag if there is a backing file or if
@@ -1615,18 +1619,19 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
     }

     /* Each L2 table is handled by its own loop iteration */
-    nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
+    nb_clusters = size_to_clusters(s, bytes);

     s->cache_discards = true;

     while (nb_clusters > 0) {
-        ret = zero_single_l2(bs, offset, nb_clusters, flags);
-        if (ret < 0) {
+        cleared = zero_single_l2(bs, offset, nb_clusters, flags);
+        if (cleared < 0) {
+            ret = cleared;
             goto fail;
         }

-        nb_clusters -= ret;
-        offset += (ret * s->cluster_size);
+        nb_clusters -= cleared;
+        offset += (cleared * s->cluster_size);
     }

     ret = 0;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0324243..44243e0 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -440,10 +440,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)

     /* The VM state isn't needed any more in the active L1 table; in fact, it
      * hurts by causing expensive COW for the next snapshot. */
-    qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
-                           align_offset(sn->vm_state_size, s->cluster_size)
-                                >> BDRV_SECTOR_BITS,
-                           QCOW2_DISCARD_NEVER, false);
+    qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
+                          align_offset(sn->vm_state_size, s->cluster_size),
+                          QCOW2_DISCARD_NEVER, false);

 #ifdef DEBUG_ALLOC
     {
diff --git a/block/qcow2.c b/block/qcow2.c
index 8038793..4d34610 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2508,7 +2508,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);

     /* Whatever is left can use real zero clusters */
-    ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
+    ret = qcow2_cluster_zeroize(bs, offset, count, flags);
     qemu_co_mutex_unlock(&s->lock);

     return ret;
@@ -2531,8 +2531,8 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
     }

     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
-                                 QCOW2_DISCARD_REQUEST, false);
+    ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST,
+                                false);
     qemu_co_mutex_unlock(&s->lock);
     return ret;
 }
@@ -2839,9 +2839,8 @@ fail:
 static int qcow2_make_empty(BlockDriverState *bs)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t start_sector;
-    int sector_step = (QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size) /
-                       BDRV_SECTOR_SIZE);
+    uint64_t offset, end_offset;
+    int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size);
     int l1_clusters, ret = 0;

     l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2858,18 +2857,16 @@ static int qcow2_make_empty(BlockDriverState *bs)

     /* This fallback code simply discards every active cluster; this is slow,
      * but works in all cases */
-    for (start_sector = 0; start_sector < bs->total_sectors;
-         start_sector += sector_step)
+    end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
+    for (offset = 0; offset < end_offset; offset += step)
     {
         /* As this function is generally used after committing an external
          * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
          * default action for this kind of discard is to pass the discard,
          * which will ideally result in an actually smaller image file, as
          * is probably desired. */
-        ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
-                                     MIN(sector_step,
-                                         bs->total_sectors - start_sector),
-                                     QCOW2_DISCARD_SNAPSHOT, true);
+        ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
+                                    QCOW2_DISCARD_SNAPSHOT, true);
         if (ret < 0) {
             break;
         }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 08/13] blkdebug: Sanity check block layer guarantees
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (6 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 09/13] blkdebug: Refactor error injection Eric Blake
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Commits 04ed95f4 and 1a62d0ac updated the block layer to auto-fragment
any I/O to fit within device boundaries. Additionally, when using a
minimum alignment of 4k, we want to ensure the block layer does proper
read-modify-write rather than requesting I/O on a slice of a sector.
Let's enforce that the contract is obeyed when using blkdebug.  For
now, blkdebug only allows alignment overrides, and just inherits other
limits from whatever device it is wrapping, but a future patch will
further enhance things.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v5-v9: no change
v4: no change
v3: rebase to byte-based interfaces
v2: new patch
---
 block/blkdebug.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index c795ae9..14d3fc5 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -431,6 +431,13 @@ blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;

+    /* Sanity check block layer guarantees */
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+    if (bs->bl.max_transfer) {
+        assert(bytes <= bs->bl.max_transfer);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         uint64_t inject_offset = rule->options.inject.offset;

@@ -455,6 +462,13 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
     BDRVBlkdebugState *s = bs->opaque;
     BlkdebugRule *rule = NULL;

+    /* Sanity check block layer guarantees */
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
+    if (bs->bl.max_transfer) {
+        assert(bytes <= bs->bl.max_transfer);
+    }
+
     QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
         uint64_t inject_offset = rule->options.inject.offset;

-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 09/13] blkdebug: Refactor error injection
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (7 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 08/13] blkdebug: Sanity check block layer guarantees Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 10/13] blkdebug: Add pass-through write_zero and discard support Eric Blake
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Rather than repeat the logic at each caller of checking if a Rule
exists that warrants an error injection, fold that logic into
inject_error(); and rename it to rule_check() for legibility.
This will help the next patch, which adds two more callers that
need to check rules for the potential of injecting errors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: new patch
---
 block/blkdebug.c | 74 +++++++++++++++++++++++++-------------------------------
 1 file changed, 33 insertions(+), 41 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 14d3fc5..db58db0 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -405,11 +405,30 @@ out:
     return ret;
 }

-static int inject_error(BlockDriverState *bs, BlkdebugRule *rule)
+static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
 {
     BDRVBlkdebugState *s = bs->opaque;
-    int error = rule->options.inject.error;
-    bool immediately = rule->options.inject.immediately;
+    BlkdebugRule *rule = NULL;
+    int error;
+    bool immediately;
+
+    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
+        uint64_t inject_offset = rule->options.inject.offset;
+
+        if (inject_offset == -1 ||
+            (bytes && inject_offset >= offset &&
+             inject_offset < offset + bytes))
+        {
+            break;
+        }
+    }
+
+    if (!rule || !rule->options.inject.error) {
+        return 0;
+    }
+
+    immediately = rule->options.inject.immediately;
+    error = rule->options.inject.error;

     if (rule->options.inject.once) {
         QSIMPLEQ_REMOVE(&s->active_rules, rule, BlkdebugRule, active_next);
@@ -428,8 +447,7 @@ static int coroutine_fn
 blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
                    QEMUIOVector *qiov, int flags)
 {
-    BDRVBlkdebugState *s = bs->opaque;
-    BlkdebugRule *rule = NULL;
+    int err;

     /* Sanity check block layer guarantees */
     assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
@@ -438,18 +456,9 @@ blkdebug_co_preadv(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         assert(bytes <= bs->bl.max_transfer);
     }

-    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
-        uint64_t inject_offset = rule->options.inject.offset;
-
-        if (inject_offset == -1 ||
-            (inject_offset >= offset && inject_offset < offset + bytes))
-        {
-            break;
-        }
-    }
-
-    if (rule && rule->options.inject.error) {
-        return inject_error(bs, rule);
+    err = rule_check(bs, offset, bytes);
+    if (err) {
+        return err;
     }

     return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
@@ -459,8 +468,7 @@ static int coroutine_fn
 blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
                     QEMUIOVector *qiov, int flags)
 {
-    BDRVBlkdebugState *s = bs->opaque;
-    BlkdebugRule *rule = NULL;
+    int err;

     /* Sanity check block layer guarantees */
     assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
@@ -469,18 +477,9 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         assert(bytes <= bs->bl.max_transfer);
     }

-    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
-        uint64_t inject_offset = rule->options.inject.offset;
-
-        if (inject_offset == -1 ||
-            (inject_offset >= offset && inject_offset < offset + bytes))
-        {
-            break;
-        }
-    }
-
-    if (rule && rule->options.inject.error) {
-        return inject_error(bs, rule);
+    err = rule_check(bs, offset, bytes);
+    if (err) {
+        return err;
     }

     return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
@@ -488,17 +487,10 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes,

 static int blkdebug_co_flush(BlockDriverState *bs)
 {
-    BDRVBlkdebugState *s = bs->opaque;
-    BlkdebugRule *rule = NULL;
+    int err = rule_check(bs, 0, 0);

-    QSIMPLEQ_FOREACH(rule, &s->active_rules, active_next) {
-        if (rule->options.inject.offset == -1) {
-            break;
-        }
-    }
-
-    if (rule && rule->options.inject.error) {
-        return inject_error(bs, rule);
+    if (err) {
+        return err;
     }

     return bdrv_co_flush(bs->file->bs);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 10/13] blkdebug: Add pass-through write_zero and discard support
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (8 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 09/13] blkdebug: Refactor error injection Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 11/13] blkdebug: Simplify override logic Eric Blake
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

In order to test the effects of artificial geometry constraints
on operations like write zero or discard, we first need blkdebug
to manage these actions.  It also allows us to inject errors on
those operations, just like we can for read/write/flush.

We can also test the contract promised by the block layer; namely,
if a device has specified limits on alignment or maximum size,
then those limits must be obeyed (for now, the blkdebug driver
merely inherits limits from whatever it is wrapping, but the next
patch will further enhance it to allow specific limit overrides).

This patch intentionally refuses to service requests smaller than
the requested alignments; this is because an upcoming patch adds
a qemu-iotest to prove that the block layer is correctly handling
fragmentation, but the test only works if there is a way to tell
the difference at artificial alignment boundaries when blkdebug is
using a larger-than-default alignment.  If we let the blkdebug
layer always defer to the underlying layer, which potentially has
a smaller granularity, the iotest will be thwarted.

Tested by setting up an NBD server with export 'foo', then invoking:
$ ./qemu-io
qemu-io> open -o driver=blkdebug blkdebug::nbd://localhost:10809/foo
qemu-io> d 0 15M
qemu-io> w -z 0 15M

Pre-patch, the server never sees the discard (it was silently
eaten by the block layer); post-patch it is passed across the
wire.  Likewise, pre-patch the write is always passed with
NBD_WRITE (with 15M of zeroes on the wire), while post-patch
it can utilize NBD_WRITE_ZEROES (for less traffic).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: tighten check of unaligned requests, rebase on rule check
refactoring, drop R-b
v5: include 2017 copyright
v4: correct error injection to respect byte range, tweak formatting
v3: rebase to byte-based read/write, improve docs on why no
partial write zero passthrough
v2: new patch
---
 block/blkdebug.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index db58db0..62d113c 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1,6 +1,7 @@
 /*
  * Block protocol for I/O error injection
  *
+ * Copyright (C) 2016-2017 Red Hat, Inc.
  * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -382,6 +383,11 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         goto out;
     }

+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags;
+    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags;
+
     /* Set request alignment */
     align = qemu_opt_get_size(opts, "align", 0);
     if (align < INT_MAX && is_power_of_2(align)) {
@@ -496,6 +502,72 @@ static int blkdebug_co_flush(BlockDriverState *bs)
     return bdrv_co_flush(bs->file->bs);
 }

+static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
+                                                  int64_t offset, int count,
+                                                  BdrvRequestFlags flags)
+{
+    uint32_t align = MAX(bs->bl.request_alignment,
+                         bs->bl.pwrite_zeroes_alignment);
+    int err;
+
+    /* Only pass through requests that are larger than requested
+     * preferred alignment (so that we test the fallback to writes on
+     * unaligned portions), and check that the block layer never hands
+     * us anything unaligned that crosses an alignment boundary.  */
+    if (count < align) {
+        assert(QEMU_IS_ALIGNED(offset, align) ||
+               QEMU_IS_ALIGNED(offset + count, align) ||
+               DIV_ROUND_UP(offset, align) ==
+               DIV_ROUND_UP(offset + count, align));
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, align));
+    assert(QEMU_IS_ALIGNED(count, align));
+    if (bs->bl.max_pwrite_zeroes) {
+        assert(count <= bs->bl.max_pwrite_zeroes);
+    }
+
+    err = rule_check(bs, offset, count);
+    if (err) {
+        return err;
+    }
+
+    return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
+}
+
+static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int count)
+{
+    uint32_t align = bs->bl.pdiscard_alignment;
+    int err;
+
+    /* Only pass through requests that are larger than requested
+     * minimum alignment, and ensure that unaligned requests do not
+     * cross optimum discard boundaries. */
+    if (count < bs->bl.request_alignment) {
+        assert(QEMU_IS_ALIGNED(offset, align) ||
+               QEMU_IS_ALIGNED(offset + count, align) ||
+               DIV_ROUND_UP(offset, align) ==
+               DIV_ROUND_UP(offset + count, align));
+        return -ENOTSUP;
+    }
+    assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment));
+    assert(QEMU_IS_ALIGNED(count, bs->bl.request_alignment));
+    if (align && count >= align) {
+        assert(QEMU_IS_ALIGNED(offset, align));
+        assert(QEMU_IS_ALIGNED(count, align));
+    }
+    if (bs->bl.max_pdiscard) {
+        assert(count <= bs->bl.max_pdiscard);
+    }
+
+    err = rule_check(bs, offset, count);
+    if (err) {
+        return err;
+    }
+
+    return bdrv_co_pdiscard(bs->file->bs, offset, count);
+}

 static void blkdebug_close(BlockDriverState *bs)
 {
@@ -750,6 +822,8 @@ static BlockDriver bdrv_blkdebug = {
     .bdrv_co_preadv         = blkdebug_co_preadv,
     .bdrv_co_pwritev        = blkdebug_co_pwritev,
     .bdrv_co_flush_to_disk  = blkdebug_co_flush,
+    .bdrv_co_pwrite_zeroes  = blkdebug_co_pwrite_zeroes,
+    .bdrv_co_pdiscard       = blkdebug_co_pdiscard,

     .bdrv_debug_event           = blkdebug_debug_event,
     .bdrv_debug_breakpoint      = blkdebug_debug_breakpoint,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 11/13] blkdebug: Simplify override logic
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (9 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 10/13] blkdebug: Add pass-through write_zero and discard support Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 12/13] blkdebug: Add ability to override unmap geometries Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 13/13] tests: Add coverage for recent block geometry fixes Eric Blake
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Rather than store into a local variable, then copy to the struct
if the value is valid, then reporting errors otherwise, it is
simpler to just store into the struct and report errors if the
value is invalid.  This however requires that the struct store
a 64-bit number, rather than a narrower type.  Move the errno
assignment into a label that will be reused from more places in
the next patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

---
v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: tweak error message, but R-b kept
v5: no change
v4: fix typo in commit message, move errno assignment
v3: new patch
---
 block/blkdebug.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 62d113c..a1501eb 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -38,7 +38,7 @@
 typedef struct BDRVBlkdebugState {
     int state;
     int new_state;
-    int align;
+    uint64_t align;

     /* For blkdebug_refresh_filename() */
     char *config_file;
@@ -353,7 +353,6 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     BDRVBlkdebugState *s = bs->opaque;
     QemuOpts *opts;
     Error *local_err = NULL;
-    uint64_t align;
     int ret;

     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
@@ -389,12 +388,10 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
         bs->file->bs->supported_zero_flags;

     /* Set request alignment */
-    align = qemu_opt_get_size(opts, "align", 0);
-    if (align < INT_MAX && is_power_of_2(align)) {
-        s->align = align;
-    } else if (align) {
-        error_setg(errp, "Invalid alignment");
-        ret = -EINVAL;
+    s->align = qemu_opt_get_size(opts, "align", 0);
+    if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
+        error_setg(errp, "Cannot meet constraints with align %" PRIu64,
+                   s->align);
         goto fail_unref;
     }

@@ -402,6 +399,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     goto out;

 fail_unref:
+    ret = -EINVAL;
     bdrv_unref_child(bs, bs->file);
 out:
     if (ret < 0) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 12/13] blkdebug: Add ability to override unmap geometries
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (10 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 11/13] blkdebug: Simplify override logic Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 13/13] tests: Add coverage for recent block geometry fixes Eric Blake
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, Markus Armbruster

Make it easier to simulate various unusual hardware setups (for
example, recent commits 3482b9b and b8d0a98 affect the Dell
Equallogic iSCSI with its 15M preferred and maximum unmap and
write zero sizing, or b2f95fe deals with the Linux loopback
block device having a max_transfer of 64k), by allowing blkdebug
to wrap any other device with further restrictions on various
alignments.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---
v9: rebase to master (dropped #optional tags)
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: more tweaks in docs and error messages
v5: tweak docs regarding max-transfer minimum
v4: relax 512 byte minimum now that blkdebug is byte-based, fix doc typo
v3: improve legibility of bounds checking, improve docs
v2: new patch
---
 qapi/block-core.json | 33 ++++++++++++++++--
 block/blkdebug.c     | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 125 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 033457c..38edada 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2428,8 +2428,33 @@
 #
 # @config:          filename of the configuration file
 #
-# @align:           required alignment for requests in bytes,
-#                   must be power of 2, or 0 for default
+# @align:           required alignment for requests in bytes, must be
+#                   positive power of 2, or 0 for default
+#
+# @max-transfer:    maximum size for I/O transfers in bytes, must be
+#                   positive multiple of @align and of the underlying
+#                   file's request alignment (but need not be a power of
+#                   2), or 0 for default (since 2.10)
+#
+# @opt-write-zero:  preferred alignment for write zero requests in bytes,
+#                   must be positive multiple of @align and of the
+#                   underlying file's request alignment (but need not be a
+#                   power of 2), or 0 for default (since 2.10)
+#
+# @max-write-zero:  maximum size for write zero requests in bytes, must be
+#                   positive multiple of @align, of @opt-write-zero, and of
+#                   the underlying file's request alignment (but need not
+#                   be a power of 2), or 0 for default (since 2.10)
+#
+# @opt-discard:     preferred alignment for discard requests in bytes, must
+#                   be positive multiple of @align and of the underlying
+#                   file's request alignment (but need not be a power of
+#                   2), or 0 for default (since 2.10)
+#
+# @max-discard:     maximum size for discard requests in bytes, must be
+#                   positive multiple of @align, of @opt-discard, and of
+#                   the underlying file's request alignment (but need not
+#                   be a power of 2), or 0 for default (since 2.10)
 #
 # @inject-error:    array of error injection descriptions
 #
@@ -2440,7 +2465,9 @@
 { 'struct': 'BlockdevOptionsBlkdebug',
   'data': { 'image': 'BlockdevRef',
             '*config': 'str',
-            '*align': 'int',
+            '*align': 'int', '*max-transfer': 'int32',
+            '*opt-write-zero': 'int32', '*max-write-zero': 'int32',
+            '*opt-discard': 'int32', '*max-discard': 'int32',
             '*inject-error': ['BlkdebugInjectErrorOptions'],
             '*set-state': ['BlkdebugSetStateOptions'] } }

diff --git a/block/blkdebug.c b/block/blkdebug.c
index a1501eb..4ccdceb 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -39,6 +39,11 @@ typedef struct BDRVBlkdebugState {
     int state;
     int new_state;
     uint64_t align;
+    uint64_t max_transfer;
+    uint64_t opt_write_zero;
+    uint64_t max_write_zero;
+    uint64_t opt_discard;
+    uint64_t max_discard;

     /* For blkdebug_refresh_filename() */
     char *config_file;
@@ -343,6 +348,31 @@ static QemuOptsList runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Required alignment in bytes",
         },
+        {
+            .name = "max-transfer",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum transfer size in bytes",
+        },
+        {
+            .name = "opt-write-zero",
+            .type = QEMU_OPT_SIZE,
+            .help = "Optimum write zero alignment in bytes",
+        },
+        {
+            .name = "max-write-zero",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum write zero size in bytes",
+        },
+        {
+            .name = "opt-discard",
+            .type = QEMU_OPT_SIZE,
+            .help = "Optimum discard alignment in bytes",
+        },
+        {
+            .name = "max-discard",
+            .type = QEMU_OPT_SIZE,
+            .help = "Maximum discard size in bytes",
+        },
         { /* end of list */ }
     },
 };
@@ -354,6 +384,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     QemuOpts *opts;
     Error *local_err = NULL;
     int ret;
+    uint64_t align;

     opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
     qemu_opts_absorb_qdict(opts, options, &local_err);
@@ -387,13 +418,61 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
     bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
         bs->file->bs->supported_zero_flags;

-    /* Set request alignment */
+    /* Set alignment overrides */
     s->align = qemu_opt_get_size(opts, "align", 0);
     if (s->align && (s->align >= INT_MAX || !is_power_of_2(s->align))) {
         error_setg(errp, "Cannot meet constraints with align %" PRIu64,
                    s->align);
         goto fail_unref;
     }
+    align = MAX(s->align, bs->file->bs->bl.request_alignment);
+
+    s->max_transfer = qemu_opt_get_size(opts, "max-transfer", 0);
+    if (s->max_transfer &&
+        (s->max_transfer >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_transfer, align))) {
+        error_setg(errp, "Cannot meet constraints with max-transfer %" PRIu64,
+                   s->max_transfer);
+        goto fail_unref;
+    }
+
+    s->opt_write_zero = qemu_opt_get_size(opts, "opt-write-zero", 0);
+    if (s->opt_write_zero &&
+        (s->opt_write_zero >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->opt_write_zero, align))) {
+        error_setg(errp, "Cannot meet constraints with opt-write-zero %" PRIu64,
+                   s->opt_write_zero);
+        goto fail_unref;
+    }
+
+    s->max_write_zero = qemu_opt_get_size(opts, "max-write-zero", 0);
+    if (s->max_write_zero &&
+        (s->max_write_zero >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_write_zero,
+                          MAX(s->opt_write_zero, align)))) {
+        error_setg(errp, "Cannot meet constraints with max-write-zero %" PRIu64,
+                   s->max_write_zero);
+        goto fail_unref;
+    }
+
+    s->opt_discard = qemu_opt_get_size(opts, "opt-discard", 0);
+    if (s->opt_discard &&
+        (s->opt_discard >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->opt_discard, align))) {
+        error_setg(errp, "Cannot meet constraints with opt-discard %" PRIu64,
+                   s->opt_discard);
+        goto fail_unref;
+    }
+
+    s->max_discard = qemu_opt_get_size(opts, "max-discard", 0);
+    if (s->max_discard &&
+        (s->max_discard >= INT_MAX ||
+         !QEMU_IS_ALIGNED(s->max_discard,
+                          MAX(s->opt_discard, align)))) {
+        error_setg(errp, "Cannot meet constraints with max-discard %" PRIu64,
+                   s->max_discard);
+        goto fail_unref;
+    }

     ret = 0;
     goto out;
@@ -793,6 +872,21 @@ static void blkdebug_refresh_limits(BlockDriverState *bs, Error **errp)
     if (s->align) {
         bs->bl.request_alignment = s->align;
     }
+    if (s->max_transfer) {
+        bs->bl.max_transfer = s->max_transfer;
+    }
+    if (s->opt_write_zero) {
+        bs->bl.pwrite_zeroes_alignment = s->opt_write_zero;
+    }
+    if (s->max_write_zero) {
+        bs->bl.max_pwrite_zeroes = s->max_write_zero;
+    }
+    if (s->opt_discard) {
+        bs->bl.pdiscard_alignment = s->opt_discard;
+    }
+    if (s->max_discard) {
+        bs->bl.max_pdiscard = s->max_discard;
+    }
 }

 static int blkdebug_reopen_prepare(BDRVReopenState *reopen_state,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9 13/13] tests: Add coverage for recent block geometry fixes
  2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
                   ` (11 preceding siblings ...)
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 12/13] blkdebug: Add ability to override unmap geometries Eric Blake
@ 2017-04-11  1:17 ` Eric Blake
  12 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11  1:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz

Use blkdebug's new geometry constraints to emulate setups that
have caused recent regression fixes: write zeroes asserting
when running through a loopback block device with max-transfer
smaller than cluster size, and discard rounding away portions
of requests not aligned to preferred boundaries.  Also, add
coverage that the block layer is honoring max transfer limits.

For now, a single iotest performs all actions, with the idea
that we can add future blkdebug constraint test cases in the
same file; but it can be split into multiple iotests if we find
reason to run one portion of the test in more setups than what
are possible in the other.

For reference, the final portion of the test (checking whether
discard passes as much as possible to the lowest layers of the
stack) works as follows:

qemu-io: discard 30M at 80000001, passed to blkdebug
  blkdebug: discard 511 bytes at 80000001, -ENOTSUP (smaller than
blkdebug's 512 align)
  blkdebug: discard 14371328 bytes at 80000512, passed to qcow2
    qcow2: discard 739840 bytes at 80000512, -ENOTSUP (smaller than
qcow2's 1M align)
    qcow2: discard 13M bytes at 77M, succeeds
  blkdebug: discard 15M bytes at 90M, passed to qcow2
    qcow2: discard 15M bytes at 90M, succeeds
  blkdebug: discard 1356800 bytes at 105M, passed to qcow2
    qcow2: discard 1M at 105M, succeeds
    qcow2: discard 308224 bytes at 106M, -ENOTSUP (smaller than qcow2's
1M align)
  blkdebug: discard 1 byte at 111457280, -ENOTSUP (smaller than
blkdebug's 512 align)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

---

[yes, I added tests 177 and 179 out of order during this series.
Oh well; I'm tired of renumbering this one.]

v9: no change
v7-v8: not submitted (earlier half of series sent for 2.9)
v6: rebase to master by renumbering s/175/177/
v5: rebase to master by renumbering s/173/175/
v4: clean up some comments, nicer backing file creation, more commit message
v3: make comments tied more to test at hand, rather than the
particular hardware that led to the earlier patches being tested
v2: new patch
---
 tests/qemu-iotests/177     | 114 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/177.out |  49 +++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 164 insertions(+)
 create mode 100755 tests/qemu-iotests/177
 create mode 100644 tests/qemu-iotests/177.out

diff --git a/tests/qemu-iotests/177 b/tests/qemu-iotests/177
new file mode 100755
index 0000000..e4ddec7
--- /dev/null
+++ b/tests/qemu-iotests/177
@@ -0,0 +1,114 @@
+#!/bin/bash
+#
+# Test corner cases with unusual block geometries
+#
+# Copyright (C) 2016-2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=eblake@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+
+CLUSTER_SIZE=1M
+size=128M
+options=driver=blkdebug,image.driver=qcow2
+
+echo
+echo "== setting up files =="
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+$QEMU_IO -c "write -P 11 0 $size" "$TEST_IMG.base" | _filter_qemu_io
+_make_test_img -b "$TEST_IMG.base"
+$QEMU_IO -c "write -P 22 0 $size" "$TEST_IMG" | _filter_qemu_io
+
+# Limited to 64k max-transfer
+echo
+echo "== constrained alignment and max-transfer =="
+limits=align=4k,max-transfer=64k
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -P 33 1000 128k" -c "read -P 33 1000 128k" | _filter_qemu_io
+
+echo
+echo "== write zero with constrained max-transfer =="
+limits=align=512,max-transfer=64k,opt-write-zero=$CLUSTER_SIZE
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -z 8003584 2093056" | _filter_qemu_io
+
+# non-power-of-2 write-zero/discard alignments
+echo
+echo "== non-power-of-2 write zeroes limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "write -z 32M 32M" | _filter_qemu_io
+
+echo
+echo "== non-power-of-2 discard limits =="
+
+limits=align=512,opt-write-zero=15M,max-write-zero=15M,opt-discard=15M,max-discard=15M
+$QEMU_IO -c "open -o $options,$limits blkdebug::$TEST_IMG" \
+         -c "discard 80000001 30M" | _filter_qemu_io
+
+echo
+echo "== verify image content =="
+
+function verify_io()
+{
+    if ($QEMU_IMG info -f "$IMGFMT" "$TEST_IMG" |
+	    grep "compat: 0.10" > /dev/null); then
+        # For v2 images, discarded clusters are read from the backing file
+        discarded=11
+    else
+        # Discarded clusters are zeroed for v3 or later
+        discarded=0
+    fi
+
+    echo read -P 22 0 1000
+    echo read -P 33 1000 128k
+    echo read -P 22 132072 7871512
+    echo read -P 0 8003584 2093056
+    echo read -P 22 10096640 23457792
+    echo read -P 0 32M 32M
+    echo read -P 22 64M 13M
+    echo read -P $discarded 77M 29M
+    echo read -P 22 106M 22M
+}
+
+verify_io | $QEMU_IO "$TEST_IMG" | _filter_qemu_io
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+status=0
diff --git a/tests/qemu-iotests/177.out b/tests/qemu-iotests/177.out
new file mode 100644
index 0000000..e887542
--- /dev/null
+++ b/tests/qemu-iotests/177.out
@@ -0,0 +1,49 @@
+QA output created by 177
+
+== setting up files ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
+wrote 134217728/134217728 bytes at offset 0
+128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 134217728/134217728 bytes at offset 0
+128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== constrained alignment and max-transfer ==
+wrote 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== write zero with constrained max-transfer ==
+wrote 2093056/2093056 bytes at offset 8003584
+1.996 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== non-power-of-2 write zeroes limits ==
+wrote 33554432/33554432 bytes at offset 33554432
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== non-power-of-2 discard limits ==
+discard 31457280/31457280 bytes at offset 80000001
+30 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify image content ==
+read 1000/1000 bytes at offset 0
+1000 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 1000
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 7871512/7871512 bytes at offset 132072
+7.507 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 2093056/2093056 bytes at offset 8003584
+1.996 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 23457792/23457792 bytes at offset 10096640
+22.371 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 33554432/33554432 bytes at offset 33554432
+32 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 13631488/13631488 bytes at offset 67108864
+13 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 30408704/30408704 bytes at offset 80740352
+29 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 23068672/23068672 bytes at offset 111149056
+22 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index cfc8823..e7f987e 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -169,4 +169,5 @@
 174 auto
 175 auto quick
 176 rw auto backing
+177 rw auto quick
 179 rw auto quick
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
@ 2017-04-11  2:37   ` Philippe Mathieu-Daudé
  2017-04-11 12:11     ` Eric Blake
  0 siblings, 1 reply; 26+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-04-11  2:37 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: kwolf, qemu-block, mreitz

Hi Eric,

On 04/10/2017 10:17 PM, Eric Blake wrote:
> For the 'alloc' command, accepting an offset in bytes but a length
> in sectors, and reporting output in sectors, is confusing.  Do
> everything in bytes, and adjust the expected output accordingly.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  qemu-io-cmds.c                    | 30 ++++++++++++++++++------------
>  tests/qemu-iotests/019.out        |  8 ++++----
>  tests/qemu-iotests/179            |  4 ++--
>  tests/qemu-iotests/179.out        |  4 ++--
>  tests/qemu-iotests/common.pattern |  2 +-
>  5 files changed, 27 insertions(+), 21 deletions(-)
>
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index 9e023a4..df7297f 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1760,7 +1760,7 @@ out:
>  static int alloc_f(BlockBackend *blk, int argc, char **argv)
>  {
>      BlockDriverState *bs = blk_bs(blk);
> -    int64_t offset, sector_num, nb_sectors, remaining;
> +    int64_t offset, sector_num, nb_sectors, remaining, bytes;
>      char s1[64];
>      int num, ret;
>      int64_t sum_alloc;
> @@ -1776,18 +1776,24 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
>      }
>
>      if (argc == 3) {
> -        nb_sectors = cvtnum(argv[2]);
> -        if (nb_sectors < 0) {
> -            print_cvtnum_err(nb_sectors, argv[2]);
> +        bytes = cvtnum(argv[2]);
> +        if (bytes < 0) {
> +            print_cvtnum_err(bytes, argv[2]);
>              return 0;
> -        } else if (nb_sectors > INT_MAX) {
> -            printf("length argument cannot exceed %d, given %s\n",
> -                   INT_MAX, argv[2]);
> +        } else if (bytes > INT_MAX * BDRV_SECTOR_SIZE) {
> +            printf("length argument cannot exceed %llu, given %s\n",
> +                   INT_MAX * BDRV_SECTOR_SIZE, argv[2]);
>              return 0;
>          }
>      } else {
> -        nb_sectors = 1;
> +        bytes = BDRV_SECTOR_SIZE;
>      }
> +    if (bytes & 0x1ff) {

This macro is self-explanatory:

if(!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE))

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> +        printf("bytes %" PRId64 " is not sector aligned\n",
> +               bytes);
> +        return 0;
> +    }
> +    nb_sectors = bytes >> BDRV_SECTOR_BITS;
>
>      remaining = nb_sectors;
>      sum_alloc = 0;
> @@ -1811,8 +1817,8 @@ static int alloc_f(BlockBackend *blk, int argc, char **argv)
>
>      cvtstr(offset, s1, sizeof(s1));
>
> -    printf("%"PRId64"/%"PRId64" sectors allocated at offset %s\n",
> -           sum_alloc, nb_sectors, s1);
> +    printf("%"PRId64"/%"PRId64" bytes allocated at offset %s\n",
> +           sum_alloc << BDRV_SECTOR_BITS, nb_sectors << BDRV_SECTOR_BITS, s1);
>      return 0;
>  }
>
> @@ -1822,8 +1828,8 @@ static const cmdinfo_t alloc_cmd = {
>      .argmin     = 1,
>      .argmax     = 2,
>      .cfunc      = alloc_f,
> -    .args       = "off [sectors]",
> -    .oneline    = "checks if a sector is present in the file",
> +    .args       = "offset [bytes]",
> +    .oneline    = "checks if offset is allocated in the file",
>  };
>
>
> diff --git a/tests/qemu-iotests/019.out b/tests/qemu-iotests/019.out
> index 0124264..17a7c03 100644
> --- a/tests/qemu-iotests/019.out
> +++ b/tests/qemu-iotests/019.out
> @@ -542,8 +542,8 @@ Testing conversion with -B TEST_DIR/t.IMGFMT.base
>
>  Checking if backing clusters are allocated when they shouldn't
>
> -0/128 sectors allocated at offset 1 MiB
> -0/128 sectors allocated at offset 4.001 GiB
> +0/65536 bytes allocated at offset 1 MiB
> +0/65536 bytes allocated at offset 4.001 GiB
>  Reading
>
>  === IO: pattern 42
> @@ -1086,8 +1086,8 @@ Testing conversion with -o backing_file=TEST_DIR/t.IMGFMT.base
>
>  Checking if backing clusters are allocated when they shouldn't
>
> -0/128 sectors allocated at offset 1 MiB
> -0/128 sectors allocated at offset 4.001 GiB
> +0/65536 bytes allocated at offset 1 MiB
> +0/65536 bytes allocated at offset 4.001 GiB
>  Reading
>
>  === IO: pattern 42
> diff --git a/tests/qemu-iotests/179 b/tests/qemu-iotests/179
> index 338a45d..44541e1 100755
> --- a/tests/qemu-iotests/179
> +++ b/tests/qemu-iotests/179
> @@ -62,12 +62,12 @@ $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
>  # But not requesting unmap must result in allocation (whether a cluster
>  # allocation in compat=0.10 or a flag allocation in compat=1.1).
>  $QEMU_IO -c "write -z 5M 1M" "$TEST_IMG.base" | _filter_qemu_io
> -$QEMU_IO -c "alloc 5M $((1024*1024 / 512))" "$TEST_IMG.base" | _filter_qemu_io
> +$QEMU_IO -c "alloc 5M 1M" "$TEST_IMG.base" | _filter_qemu_io
>
>  # Presence of a backing file overrides permission to unmap.  Again,
>  # compat=0.10 images allocate, while compat=1.1 images set zero flag.
>  $QEMU_IO -c "write -z -u 7M 1M" "$TEST_IMG" | _filter_qemu_io
> -$QEMU_IO -c "alloc 7M $((1024 * 1024 / 512))" "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "alloc 7M 1M" "$TEST_IMG" | _filter_qemu_io
>
>  # Final check that images are still sane.
>  TEST_IMG="$TEST_IMG.base" _check_test_img
> diff --git a/tests/qemu-iotests/179.out b/tests/qemu-iotests/179.out
> index fc97b19..18ecf0f 100644
> --- a/tests/qemu-iotests/179.out
> +++ b/tests/qemu-iotests/179.out
> @@ -13,10 +13,10 @@ wrote 1048576/1048576 bytes at offset 3145728
>  [{ "start": 0, "length": 16777216, "depth": 1, "zero": true, "data": false}]
>  wrote 1048576/1048576 bytes at offset 5242880
>  1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -2048/2048 sectors allocated at offset 5 MiB
> +1048576/1048576 bytes allocated at offset 5 MiB
>  wrote 1048576/1048576 bytes at offset 7340032
>  1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> -2048/2048 sectors allocated at offset 7 MiB
> +1048576/1048576 bytes allocated at offset 7 MiB
>  No errors were found on the image.
>  No errors were found on the image.
>  *** done
> diff --git a/tests/qemu-iotests/common.pattern b/tests/qemu-iotests/common.pattern
> index ddfbca1..34f4a8d 100644
> --- a/tests/qemu-iotests/common.pattern
> +++ b/tests/qemu-iotests/common.pattern
> @@ -18,7 +18,7 @@
>
>  function do_is_allocated() {
>      local start=$1
> -    local size=$(( $2 / 512))
> +    local size=$2
>      local step=$3
>      local count=$4
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length
  2017-04-11  2:37   ` Philippe Mathieu-Daudé
@ 2017-04-11 12:11     ` Eric Blake
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11 12:11 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel; +Cc: kwolf, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

On 04/10/2017 09:37 PM, Philippe Mathieu-Daudé wrote:
> Hi Eric,
> 
> On 04/10/2017 10:17 PM, Eric Blake wrote:
>> For the 'alloc' command, accepting an offset in bytes but a length
>> in sectors, and reporting output in sectors, is confusing.  Do
>> everything in bytes, and adjust the expected output accordingly.
>>

>>          }
>>      } else {
>> -        nb_sectors = 1;
>> +        bytes = BDRV_SECTOR_SIZE;
>>      }
>> +    if (bytes & 0x1ff) {
> 
> This macro is self-explanatory:
> 
> if(!QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE))

Indeed; but I was going for copy-and-paste consistency with the rest of
the file that also open-codes this.  A separate cleanup patch for all of
them would qualify as trivial.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count Eric Blake
@ 2017-04-11 22:12   ` Eric Blake
  2017-04-11 22:15   ` [Qemu-devel] [PATCH v9.5 07/13] fixup! " Eric Blake
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11 22:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

On 04/10/2017 08:17 PM, Eric Blake wrote:
> Passing a byte offset, but sector count, when we ultimately
> want to operate on cluster granularity, is madness.  Clean up
> the external interfaces to take both offset and count as bytes,
> while still keeping the assertion added previously that the
> caller must align the values to a cluster.  Then rename things
> to make sure backports don't get confused by changed units:
> instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
> we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
> 
> The internal functions still operate on clusters at a time, and
> return an int for number of cleared clusters; but on an image
> with 2M clusters, a single L2 table holds 256k entries that each
> represent a 2M cluster, totalling well over INT_MAX bytes if we
> ever had a request for that many bytes at once.  All our callers
> currently limit themselves to 32-bit bytes (and therefore fewer
> clusters), but by making this function 64-bit clean, we have one
> less place to clean up if we later improve the block layer to
> support 64-bit bytes through all operations (with the block layer
> auto-fragmenting on behalf of more-limited drivers), rather than
> the current state where some interfaces are artificially limited
> to INT_MAX at a time.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 

> +int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
> +                          uint64_t bytes, int flags)
>  {
>      BDRVQcow2State *s = bs->opaque;
>      uint64_t end_offset;
>      uint64_t nb_clusters;
> +    int64_t cleared;
>      int ret;
> 
> -    end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
> +    end_offset = offset + bytes;
> 
>      /* Caller must pass aligned values, except at image end */
>      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
>      assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
>             end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> +    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));

And I promptly botched my rebasing.  qemu-iotests 154 caught my mistake
(the whole point of calculating end_offset is because bytes need not be
cluster-aligned if end_offset is at EOF). I'll send the obvious fixup.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v9.5 07/13] fixup! qcow2: Discard/zero clusters by byte count
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count Eric Blake
  2017-04-11 22:12   ` Eric Blake
@ 2017-04-11 22:15   ` Eric Blake
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-11 22:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, mreitz, qemu-block

Squash this to keep iotest 154 happy.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
v9.5: fixup
v9: rebase to earlier changes, drop R-b
v7, v8: only earlier half of series submitted for 2.9
v6: rebase due to context
v5: s/count/byte/ to make the units obvious, and rework the math
to ensure no 32-bit integer overflow on large clusters
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
 block/qcow2-cluster.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 595e6e3..bc59cb8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1597,18 +1597,15 @@ int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
                           uint64_t bytes, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
-    uint64_t end_offset;
+    uint64_t end_offset = offset + bytes;
     uint64_t nb_clusters;
     int64_t cleared;
     int ret;

-    end_offset = offset + bytes;
-
     /* Caller must pass aligned values, except at image end */
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
            end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
-    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));

     /* The zero flag is only supported by version 3 and newer; we
      * require the use of that flag if there is a backing file or if
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file Eric Blake
@ 2017-04-12  9:49   ` Kevin Wolf
  2017-04-12 13:32     ` Eric Blake
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Kevin Wolf @ 2017-04-12  9:49 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, mreitz

Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
> 'qemu-img map' already coalesces information about unallocated
> clusters (those with status 0) and pure zero clusters (those
> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
> qcow2 images with no backing file already report all unallocated
> clusters (in the preallocation sense of clusters with no offset)
> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
> to external users), thanks to generic block layer code in
> bdrv_co_get_block_status().
> 
> So, for an image with no backing file, having bdrv_pwrite_zeroes
> mark clusters as unallocated (defer to backing file) rather than
> reads-as-zero (regardless of backing file) makes no difference
> to normal behavior, but may potentially allow for fewer writes to
> the L2 table of a freshly-created image where the L2 table is
> initially written to all-zeroes (although I don't actually know
> if we skip an L2 update and flush when re-writing the same
> contents as already present).

I don't get this. Allocating a cluster always involves an L2 update, no
matter whether it was previously unallocated or a zero cluster.

> Furthermore, this matches the behavior of discard_single_l2(), in
> favoring an unallocated cluster over a zero cluster when full
> discard is requested.

The only use for "full discard" is qcow2_make_empty(). It explicitly
requests that the backing file becomes visible again. This is a
completely different case.

In other words, in order to stay consistent between discard and
write_zeroes from a guest POV, we need to leave this code alone.

> Meanwhile, version 2 qcow2 files (compat=0.10) lack support for an
> explicit zero cluster.  This minor tweak therefore allows us to turn
> write zeroes with unmap into an actual unallocation on those files,
> where they used to return -ENOTSUP and cause an allocation due to
> the fallback to explicitly written zeroes.

Okay, this is true.

But I doubt that making write_zeroes more efficient on v2 images without
a backing file is really worth any extra complexity at this point...

Kevin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-04-12  9:49   ` Kevin Wolf
@ 2017-04-12 13:32     ` Eric Blake
  2017-04-21 18:42     ` Eric Blake
  2017-05-11 14:56     ` Eric Blake
  2 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-12 13:32 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 4409 bytes --]

On 04/12/2017 04:49 AM, Kevin Wolf wrote:
> Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
>> 'qemu-img map' already coalesces information about unallocated
>> clusters (those with status 0) and pure zero clusters (those
>> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
>> qcow2 images with no backing file already report all unallocated
>> clusters (in the preallocation sense of clusters with no offset)
>> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
>> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
>> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
>> to external users), thanks to generic block layer code in
>> bdrv_co_get_block_status().
>>
>> So, for an image with no backing file, having bdrv_pwrite_zeroes
>> mark clusters as unallocated (defer to backing file) rather than
>> reads-as-zero (regardless of backing file) makes no difference
>> to normal behavior, but may potentially allow for fewer writes to
>> the L2 table of a freshly-created image where the L2 table is
>> initially written to all-zeroes (although I don't actually know
>> if we skip an L2 update and flush when re-writing the same
>> contents as already present).
> 
> I don't get this. Allocating a cluster always involves an L2 update, no
> matter whether it was previously unallocated or a zero cluster.

But we're NOT allocating a cluster.  Either we are unmapping a cluster,
or are we are changing a pure unallocated cluster (bdrv_get_block_status
of 0) to a read-as-zero but unallocated cluster (status 1) [different
from the 'bdrv_is_allocated' sense of an allocated cluster being any
cluster determined by the current layer instead of by its backing file -
I really wish we wouldn't overload 'allocated' to two different
meanings, but am not sure which of the two meanings would be better to
rename, and to what].  My argument is that for a brand new image with no
backing file, all the clusters are pure unallocated, and that a write
zero with unmapping is doing more work by writing 0 => 1 than it would
by leaving 0 => 0, with identical results.


> 
>> Furthermore, this matches the behavior of discard_single_l2(), in
>> favoring an unallocated cluster over a zero cluster when full
>> discard is requested.
> 
> The only use for "full discard" is qcow2_make_empty(). It explicitly
> requests that the backing file becomes visible again. This is a
> completely different case.
> 
> In other words, in order to stay consistent between discard and
> write_zeroes from a guest POV, we need to leave this code alone.

I already argued that from a guest POV, the code is identical.  The only
situation that I changed is the situation where there is no backing
file, so the guest is guaranteed to read zeros from the cluster whether
the cluster is marked pure unallocated (0) or as reads-as-zero
unallocated (1).  The guest already requested unmapping, so they don't
need the preallocated-reads-as-zero (1 | offset).

Note that I absolutely agree that we must NOT make this change when
there is a backing file - a guest request to write zeroes MUST use
reads-as-zero (1) or preallocated reads-as-zero (1 | offset), and not
pure unallocated, unless we can guarantee that falling back to the
backing file will also read as zeroes, but taking the time to learn the
backing file status is not worth the expense.

> 
>> Meanwhile, version 2 qcow2 files (compat=0.10) lack support for an
>> explicit zero cluster.  This minor tweak therefore allows us to turn
>> write zeroes with unmap into an actual unallocation on those files,
>> where they used to return -ENOTSUP and cause an allocation due to
>> the fallback to explicitly written zeroes.
> 
> Okay, this is true.
> 
> But I doubt that making write_zeroes more efficient on v2 images without
> a backing file is really worth any extra complexity at this point...

Is it just my commit message that is wrong?  The code itself is fairly
short to review (I spent a lot more time on the commit message than on
the code - and it looks like I still need more time on the commit
message).  And later patches in the series DO depend on this, so I'm not
ready to just drop the patch yet.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-04-12  9:49   ` Kevin Wolf
  2017-04-12 13:32     ` Eric Blake
@ 2017-04-21 18:42     ` Eric Blake
  2017-05-11 14:56     ` Eric Blake
  2 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-04-21 18:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, mreitz

[-- Attachment #1: Type: text/plain, Size: 3323 bytes --]

On 04/12/2017 04:49 AM, Kevin Wolf wrote:
>> Furthermore, this matches the behavior of discard_single_l2(), in
>> favoring an unallocated cluster over a zero cluster when full
>> discard is requested.
> 
> The only use for "full discard" is qcow2_make_empty(). It explicitly
> requests that the backing file becomes visible again. This is a
> completely different case.

No, let's look at that code - I'm not referring to the
qcow2_make_empty() case, but to the bdrv_pdiscard() case:


        /*
         * If full_discard is false, make sure that a discarded area
reads back
         * as zeroes for v3 images (we cannot do it for v2 without actually
         * writing a zero-filled buffer). We can skip the operation if the
         * cluster is already marked as zero, or if it's unallocated and we
         * don't have a backing file.
         *
         * TODO We might want to use bdrv_get_block_status(bs) here, but
we're
         * holding s->lock, so that doesn't work today.
         *
         * If full_discard is true, the sector should not read back as
zeroes,
         * but rather fall through to the backing file.
         */
        switch (qcow2_get_cluster_type(old_l2_entry)) {
            case QCOW2_CLUSTER_UNALLOCATED:
                if (full_discard || !bs->backing) {
                    continue;
                }
                break;


That says: if our cluster is currently unallocated, and we have no
backing file, then LEAVE it unallocated (rather than marking it
QCOW2_CLUSTER_ZERO), even through bdrv_pdiscard.

Meanwhile, the next lines:

            case QCOW2_CLUSTER_ZERO:
                if (!full_discard) {
                    continue;
                }

state that if we are doing bdrv_pdiscard(), and the cluster already
reads as zeroes, then minimize the churn by not changing the cluster
(and that in turn means we do NOT lose a preallocation).

> 
> In other words, in order to stay consistent between discard and
> write_zeroes from a guest POV, we need to leave this code alone.

Again, my argument is that if we have no backing file, and the cluster
is previously unallocated, then leaving it unallocated is equivalent to
changing it to QCOW2_CLUSTER_ZERO.  There is no difference to the guest
POV by this change.

> 
>> Meanwhile, version 2 qcow2 files (compat=0.10) lack support for an
>> explicit zero cluster.  This minor tweak therefore allows us to turn
>> write zeroes with unmap into an actual unallocation on those files,
>> where they used to return -ENOTSUP and cause an allocation due to
>> the fallback to explicitly written zeroes.
> 
> Okay, this is true.
> 
> But I doubt that making write_zeroes more efficient on v2 images without
> a backing file is really worth any extra complexity at this point...

My argument is that it is not much complexity, and that making this
change coupled with the new qemu-iotest 179 in patch 2/13 (which fails
without this change) enabled me to test qemu-io -c 'w -z -u' (which
nothing else was testing) and ensure that I'm not regression when the
rest of the series further tweaks things to be byte-based.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-04-12  9:49   ` Kevin Wolf
  2017-04-12 13:32     ` Eric Blake
  2017-04-21 18:42     ` Eric Blake
@ 2017-05-11 14:56     ` Eric Blake
  2017-05-11 15:18       ` Eric Blake
  2017-05-12 16:06       ` Max Reitz
  2 siblings, 2 replies; 26+ messages in thread
From: Eric Blake @ 2017-05-11 14:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, mreitz, John Snow

[-- Attachment #1: Type: text/plain, Size: 3873 bytes --]

[revisiting this older patch version, even though the final version in
today's pull request changed somewhat from this approach]

On 04/12/2017 04:49 AM, Kevin Wolf wrote:
> Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
>> 'qemu-img map' already coalesces information about unallocated
>> clusters (those with status 0) and pure zero clusters (those
>> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
>> qcow2 images with no backing file already report all unallocated
>> clusters (in the preallocation sense of clusters with no offset)
>> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
>> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
>> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
>> to external users), thanks to generic block layer code in
>> bdrv_co_get_block_status().
>>
>> So, for an image with no backing file, having bdrv_pwrite_zeroes
>> mark clusters as unallocated (defer to backing file) rather than
>> reads-as-zero (regardless of backing file) makes no difference
>> to normal behavior, but may potentially allow for fewer writes to
>> the L2 table of a freshly-created image where the L2 table is
>> initially written to all-zeroes (although I don't actually know
>> if we skip an L2 update and flush when re-writing the same
>> contents as already present).
> 
> I don't get this. Allocating a cluster always involves an L2 update, no
> matter whether it was previously unallocated or a zero cluster.

On IRC, John, Kevin, and I were discussing the current situation with
libvirt NBD storage migration.  When libvirt creates a file on the
destination (rather than the user pre-creating it), it currently
defaults to 0.10 [v2] images, even if the source was a 1.1 image [v3]
(arguably something that could be improved in libvirt, but
https://bugzilla.redhat.com/show_bug.cgi?id=1371749 was closed as not a
bug).

Therefore, the use case of doing a mirror job to a v2 image, and having
that image become thick even though the source was thin, is happening
more than we'd like
(https://bugzilla.redhat.com/show_bug.cgi?id=1371749).  While Kevin had
a point that in the common case we ALWAYS want to turn an unallocated
cluster into a zero cluster (so that we don't have to audit whether all
callers are properly accounting for the case where a backing image is
added later), our conversation on IRC today conceded that we may want to
introduce a new BDRV_REQ_READ_ZERO_NOP (or some such name) that
particular callers can use to request that if a cluster already reads as
zeroes, the write zero request does NOT have to change it.  Normal guest
operations would not use the flag, but mirroring IS a case that would
use the flag, so that we can end up with thinner mirrors even to 0.10
images.

The other consideration is that on 0.10 images, even if we have to
allocate, right now our allocation is done by way of failing with
-ENOTSUP and falling back to the normal pwrite() of explicit zeroes.  It
may be worth teaching the qcow2 layer to explicitly handle write zeroes,
even on 0.10 images, by allocating a cluster (as needed) but then
telling bs->file to write zeroes (punching a hole as appropriate) so
that the file is still thin.  In fact, it matches the fact that we
already have code that probes whether a qcow2 cluster that reports
BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID then probes bs->file to see if
there is a hole there, at which point it can add BDRV_BLOCK_ZERO to the
bdrv_get_block_status.

I don't know which one of us will tackle patches along these lines, but
wanted to at least capture the gist of the IRC conversation in the
archives for a bit more permanence.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-05-11 14:56     ` Eric Blake
@ 2017-05-11 15:18       ` Eric Blake
  2017-05-12 16:06       ` Max Reitz
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Blake @ 2017-05-11 15:18 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, mreitz, John Snow

[-- Attachment #1: Type: text/plain, Size: 960 bytes --]

On 05/11/2017 09:56 AM, Eric Blake wrote:
> 
> On IRC, John, Kevin, and I were discussing the current situation with
> libvirt NBD storage migration.  When libvirt creates a file on the
> destination (rather than the user pre-creating it), it currently
> defaults to 0.10 [v2] images, even if the source was a 1.1 image [v3]
> (arguably something that could be improved in libvirt, but
> https://bugzilla.redhat.com/show_bug.cgi?id=1371749 was closed as not a
> bug).
> 
> Therefore, the use case of doing a mirror job to a v2 image, and having
> that image become thick even though the source was thin, is happening
> more than we'd like
> (https://bugzilla.redhat.com/show_bug.cgi?id=1371749).  While Kevin had

Bah, pasted the wrong link.  Correct one is:
https://bugzilla.redhat.com/show_bug.cgi?id=1219541

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-05-11 14:56     ` Eric Blake
  2017-05-11 15:18       ` Eric Blake
@ 2017-05-12 16:06       ` Max Reitz
  2017-05-12 23:00         ` John Snow
  1 sibling, 1 reply; 26+ messages in thread
From: Max Reitz @ 2017-05-12 16:06 UTC (permalink / raw)
  To: Eric Blake, Kevin Wolf; +Cc: qemu-devel, qemu-block, John Snow

[-- Attachment #1: Type: text/plain, Size: 4674 bytes --]

On 2017-05-11 16:56, Eric Blake wrote:
> [revisiting this older patch version, even though the final version in
> today's pull request changed somewhat from this approach]
> 
> On 04/12/2017 04:49 AM, Kevin Wolf wrote:
>> Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
>>> 'qemu-img map' already coalesces information about unallocated
>>> clusters (those with status 0) and pure zero clusters (those
>>> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
>>> qcow2 images with no backing file already report all unallocated
>>> clusters (in the preallocation sense of clusters with no offset)
>>> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
>>> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
>>> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
>>> to external users), thanks to generic block layer code in
>>> bdrv_co_get_block_status().
>>>
>>> So, for an image with no backing file, having bdrv_pwrite_zeroes
>>> mark clusters as unallocated (defer to backing file) rather than
>>> reads-as-zero (regardless of backing file) makes no difference
>>> to normal behavior, but may potentially allow for fewer writes to
>>> the L2 table of a freshly-created image where the L2 table is
>>> initially written to all-zeroes (although I don't actually know
>>> if we skip an L2 update and flush when re-writing the same
>>> contents as already present).
>>
>> I don't get this. Allocating a cluster always involves an L2 update, no
>> matter whether it was previously unallocated or a zero cluster.
> 
> On IRC, John, Kevin, and I were discussing the current situation with
> libvirt NBD storage migration.  When libvirt creates a file on the
> destination (rather than the user pre-creating it), it currently
> defaults to 0.10 [v2] images, even if the source was a 1.1 image [v3]
> (arguably something that could be improved in libvirt, but
> https://bugzilla.redhat.com/show_bug.cgi?id=1371749 was closed as not a
> bug).
> 
> Therefore, the use case of doing a mirror job to a v2 image, and having
> that image become thick even though the source was thin, is happening
> more than we'd like
> (https://bugzilla.redhat.com/show_bug.cgi?id=1371749).  While Kevin had
> a point that in the common case we ALWAYS want to turn an unallocated
> cluster into a zero cluster (so that we don't have to audit whether all
> callers are properly accounting for the case where a backing image is
> added later), our conversation on IRC today conceded that we may want to
> introduce a new BDRV_REQ_READ_ZERO_NOP (or some such name) that
> particular callers can use to request that if a cluster already reads as
> zeroes, the write zero request does NOT have to change it.  Normal guest
> operations would not use the flag, but mirroring IS a case that would
> use the flag, so that we can end up with thinner mirrors even to 0.10
> images.
> 
> The other consideration is that on 0.10 images, even if we have to
> allocate, right now our allocation is done by way of failing with
> -ENOTSUP and falling back to the normal pwrite() of explicit zeroes.  It
> may be worth teaching the qcow2 layer to explicitly handle write zeroes,
> even on 0.10 images, by allocating a cluster (as needed) but then
> telling bs->file to write zeroes (punching a hole as appropriate) so
> that the file is still thin.  In fact, it matches the fact that we
> already have code that probes whether a qcow2 cluster that reports
> BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID then probes bs->file to see if
> there is a hole there, at which point it can add BDRV_BLOCK_ZERO to the
> bdrv_get_block_status.
> 
> I don't know which one of us will tackle patches along these lines, but
> wanted to at least capture the gist of the IRC conversation in the
> archives for a bit more permanence.

Just short ideas:

(1) I do consider it a bug if v2 images are created. The BZ wasn't
closed for a very good reason, but because nobody answered this question:

> is this really a problem?

And apparently it is.

(2) There is a way of creating zero clusters on v2 images, but we've
never done it (yet): You create a single data cluster containing zeroes
and then you just point to it whenever you need a zero cluster (until
its refcount overflows, at which point you allocate another cluster).
Maybe that helps.

I'd be really careful with optimizations for v2 images. You have already
proven to me that my fears of too much complexity are out of proportions
sometimes, but still. v2 upstream is old legacy and should be treated as
such, at least IMHO.

Max



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 498 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-05-12 16:06       ` Max Reitz
@ 2017-05-12 23:00         ` John Snow
  2017-05-15 18:35           ` Max Reitz
  0 siblings, 1 reply; 26+ messages in thread
From: John Snow @ 2017-05-12 23:00 UTC (permalink / raw)
  To: Max Reitz, Eric Blake, Kevin Wolf; +Cc: qemu-devel, qemu-block



On 05/12/2017 12:06 PM, Max Reitz wrote:
> On 2017-05-11 16:56, Eric Blake wrote:
>> [revisiting this older patch version, even though the final version in
>> today's pull request changed somewhat from this approach]
>>
>> On 04/12/2017 04:49 AM, Kevin Wolf wrote:
>>> Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
>>>> 'qemu-img map' already coalesces information about unallocated
>>>> clusters (those with status 0) and pure zero clusters (those
>>>> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
>>>> qcow2 images with no backing file already report all unallocated
>>>> clusters (in the preallocation sense of clusters with no offset)
>>>> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
>>>> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
>>>> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
>>>> to external users), thanks to generic block layer code in
>>>> bdrv_co_get_block_status().
>>>>
>>>> So, for an image with no backing file, having bdrv_pwrite_zeroes
>>>> mark clusters as unallocated (defer to backing file) rather than
>>>> reads-as-zero (regardless of backing file) makes no difference
>>>> to normal behavior, but may potentially allow for fewer writes to
>>>> the L2 table of a freshly-created image where the L2 table is
>>>> initially written to all-zeroes (although I don't actually know
>>>> if we skip an L2 update and flush when re-writing the same
>>>> contents as already present).
>>>
>>> I don't get this. Allocating a cluster always involves an L2 update, no
>>> matter whether it was previously unallocated or a zero cluster.
>>
>> On IRC, John, Kevin, and I were discussing the current situation with
>> libvirt NBD storage migration.  When libvirt creates a file on the
>> destination (rather than the user pre-creating it), it currently
>> defaults to 0.10 [v2] images, even if the source was a 1.1 image [v3]
>> (arguably something that could be improved in libvirt, but
>> https://bugzilla.redhat.com/show_bug.cgi?id=1371749 was closed as not a
>> bug).
>>
>> Therefore, the use case of doing a mirror job to a v2 image, and having
>> that image become thick even though the source was thin, is happening
>> more than we'd like
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1371749).  While Kevin had
>> a point that in the common case we ALWAYS want to turn an unallocated
>> cluster into a zero cluster (so that we don't have to audit whether all
>> callers are properly accounting for the case where a backing image is
>> added later), our conversation on IRC today conceded that we may want to
>> introduce a new BDRV_REQ_READ_ZERO_NOP (or some such name) that
>> particular callers can use to request that if a cluster already reads as
>> zeroes, the write zero request does NOT have to change it.  Normal guest
>> operations would not use the flag, but mirroring IS a case that would
>> use the flag, so that we can end up with thinner mirrors even to 0.10
>> images.
>>
>> The other consideration is that on 0.10 images, even if we have to
>> allocate, right now our allocation is done by way of failing with
>> -ENOTSUP and falling back to the normal pwrite() of explicit zeroes.  It
>> may be worth teaching the qcow2 layer to explicitly handle write zeroes,
>> even on 0.10 images, by allocating a cluster (as needed) but then
>> telling bs->file to write zeroes (punching a hole as appropriate) so
>> that the file is still thin.  In fact, it matches the fact that we
>> already have code that probes whether a qcow2 cluster that reports
>> BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID then probes bs->file to see if
>> there is a hole there, at which point it can add BDRV_BLOCK_ZERO to the
>> bdrv_get_block_status.
>>
>> I don't know which one of us will tackle patches along these lines, but
>> wanted to at least capture the gist of the IRC conversation in the
>> archives for a bit more permanence.
> 
> Just short ideas:
> 
> (1) I do consider it a bug if v2 images are created. The BZ wasn't
> closed for a very good reason, but because nobody answered this question:
> 
>> is this really a problem?
> 
> And apparently it is.
> 
> (2) There is a way of creating zero clusters on v2 images, but we've
> never done it (yet): You create a single data cluster containing zeroes
> and then you just point to it whenever you need a zero cluster (until
> its refcount overflows, at which point you allocate another cluster).
> Maybe that helps.
> 
> I'd be really careful with optimizations for v2 images. You have already
> proven to me that my fears of too much complexity are out of proportions
> sometimes, but still. v2 upstream is old legacy and should be treated as
> such, at least IMHO.
> 
> Max
> 
> 

I agree that V2 changes should be limited in nature, but there are
less-fiddly ways to have thin 0.10 images on sparse filesystems. This
way feels a little too clever and mostly likely too intrusive for an old
format.

I'd also think that it'd probably confuse older copies of qemu-img
check, so maybe this is too hacky of a solution.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file
  2017-05-12 23:00         ` John Snow
@ 2017-05-15 18:35           ` Max Reitz
  0 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2017-05-15 18:35 UTC (permalink / raw)
  To: John Snow, Eric Blake, Kevin Wolf; +Cc: qemu-devel, qemu-block

[-- Attachment #1: Type: text/plain, Size: 7964 bytes --]

On 2017-05-13 01:00, John Snow wrote:
> 
> 
> On 05/12/2017 12:06 PM, Max Reitz wrote:
>> On 2017-05-11 16:56, Eric Blake wrote:
>>> [revisiting this older patch version, even though the final version in
>>> today's pull request changed somewhat from this approach]
>>>
>>> On 04/12/2017 04:49 AM, Kevin Wolf wrote:
>>>> Am 11.04.2017 um 03:17 hat Eric Blake geschrieben:
>>>>> 'qemu-img map' already coalesces information about unallocated
>>>>> clusters (those with status 0) and pure zero clusters (those
>>>>> with status BDRV_BLOCK_ZERO and no offset).  Furthermore, all
>>>>> qcow2 images with no backing file already report all unallocated
>>>>> clusters (in the preallocation sense of clusters with no offset)
>>>>> as BDRV_BLOCK_ZERO, regardless of whether the QCOW_OFLAG_ZERO was
>>>>> set in that L2 entry (QCOW_OFLAG_ZERO also implies a return of
>>>>> BDRV_BLOCK_ALLOCATED, but we intentionally do not expose that bit
>>>>> to external users), thanks to generic block layer code in
>>>>> bdrv_co_get_block_status().
>>>>>
>>>>> So, for an image with no backing file, having bdrv_pwrite_zeroes
>>>>> mark clusters as unallocated (defer to backing file) rather than
>>>>> reads-as-zero (regardless of backing file) makes no difference
>>>>> to normal behavior, but may potentially allow for fewer writes to
>>>>> the L2 table of a freshly-created image where the L2 table is
>>>>> initially written to all-zeroes (although I don't actually know
>>>>> if we skip an L2 update and flush when re-writing the same
>>>>> contents as already present).
>>>>
>>>> I don't get this. Allocating a cluster always involves an L2 update, no
>>>> matter whether it was previously unallocated or a zero cluster.
>>>
>>> On IRC, John, Kevin, and I were discussing the current situation with
>>> libvirt NBD storage migration.  When libvirt creates a file on the
>>> destination (rather than the user pre-creating it), it currently
>>> defaults to 0.10 [v2] images, even if the source was a 1.1 image [v3]
>>> (arguably something that could be improved in libvirt, but
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1371749 was closed as not a
>>> bug).
>>>
>>> Therefore, the use case of doing a mirror job to a v2 image, and having
>>> that image become thick even though the source was thin, is happening
>>> more than we'd like
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1371749).  While Kevin had
>>> a point that in the common case we ALWAYS want to turn an unallocated
>>> cluster into a zero cluster (so that we don't have to audit whether all
>>> callers are properly accounting for the case where a backing image is
>>> added later), our conversation on IRC today conceded that we may want to
>>> introduce a new BDRV_REQ_READ_ZERO_NOP (or some such name) that
>>> particular callers can use to request that if a cluster already reads as
>>> zeroes, the write zero request does NOT have to change it.  Normal guest
>>> operations would not use the flag, but mirroring IS a case that would
>>> use the flag, so that we can end up with thinner mirrors even to 0.10
>>> images.

Reading this again, you could just pass through the write-zero request.
The issue right now is that the qcow2 driver just returns -ENOTSUP for
every write-zero request on v2 images when it could just create a data
cluster and invoke a write-zero request on the data.

Just not suppressing the BDRV_REQ_ZERO_WRITE flag for the generic
fall-back zero write (with bs->supported_write_flags listing it) and
then issuing a bdrv_co_pwrite_zeroes() in qcow2_co_pwritev() works (with
BDRV_REQ_MAY_UNMAP).

(Whether or not to set BDRV_REQ_MAY_UNMAP should be derived from...
Maybe QCOW2_DISCARD_OTHER?)

>>> The other consideration is that on 0.10 images, even if we have to
>>> allocate, right now our allocation is done by way of failing with
>>> -ENOTSUP and falling back to the normal pwrite() of explicit zeroes.  It
>>> may be worth teaching the qcow2 layer to explicitly handle write zeroes,
>>> even on 0.10 images, by allocating a cluster (as needed) but then
>>> telling bs->file to write zeroes (punching a hole as appropriate) so
>>> that the file is still thin.  In fact, it matches the fact that we
>>> already have code that probes whether a qcow2 cluster that reports
>>> BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID then probes bs->file to see if
>>> there is a hole there, at which point it can add BDRV_BLOCK_ZERO to the
>>> bdrv_get_block_status.
>>>
>>> I don't know which one of us will tackle patches along these lines, but
>>> wanted to at least capture the gist of the IRC conversation in the
>>> archives for a bit more permanence.
>>
>> Just short ideas:
>>
>> (1) I do consider it a bug if v2 images are created. The BZ wasn't
>> closed for a very good reason, but because nobody answered this question:
>>
>>> is this really a problem?
>>
>> And apparently it is.
>>
>> (2) There is a way of creating zero clusters on v2 images, but we've
>> never done it (yet): You create a single data cluster containing zeroes
>> and then you just point to it whenever you need a zero cluster (until
>> its refcount overflows, at which point you allocate another cluster).
>> Maybe that helps.
>>
>> I'd be really careful with optimizations for v2 images. You have already
>> proven to me that my fears of too much complexity are out of proportions
>> sometimes, but still. v2 upstream is old legacy and should be treated as
>> such, at least IMHO.
>>
>> Max
>>
>>
> 
> I agree that V2 changes should be limited in nature, but there are
> less-fiddly ways to have thin 0.10 images on sparse filesystems. This
> way feels a little too clever and mostly likely too intrusive for an old
> format.

Yes.

> I'd also think that it'd probably confuse older copies of qemu-img
> check, so maybe this is too hacky of a solution.

I don't.

Anyway, feel free to convince me by writing a simple patch that allows
the v2 sparseness you need.

Let's again get to the root of the issue: Mirroring to a new v2 qcow2
image does not create a sparse image. There are multiple questions here:

(1) Who does mirroring?
(2) Who uses v2 qcow2 for the destination?
(3) Who needs sparse images?

(1) is clear, everybody does it. (3) is rather clear, it would be nice
to have (but not an absolute catastrophe if we didn't). So what remains
is (2).

From qemu's perspective, qcow2v3 has been the default for quite some
time. So just using qemu, nobody will use v2.

Maybe you're using an old qemu, maybe you're using RHEL 6's qemu which
only supports v2. This is an argument, but I'd very much just recommend
an update. (Bit of an internal question: Are you willing to backport the
upstream changes this would bring to RHEL 6?)

And then there's the case the BZ is about: Maybe you're using a
management tool which forces v2 images. I'd call it a bug in the
management tool because we didn't make v3 the default for nothing. Maybe
there are good reasons why some management tools create v2, but I don't
know any and when the BZ was closed none were given. It was just closed
because there was no apparent reason not to (but there is!).


So I see the following courses of action:

1. Ignore the v2 "issue". It's not an upstream issue, it's a bug in a
management tool, so let's just reopen the BZ and ignore RHEL 6 users.
Also, v2 just is a limited format, there is a reason v3 exists! So I
wouldn't even call it an issue, it's just how it is.

2. Improve v2. Would "fix" the issue at hand while still not letting
RHEL 7's libvirt create v3 images (which might very well come back to
haunt us some other day), and would complicate upstream code for no
upstream reason.

Now, as I said, I'm open to be convinced that (2) is viable: If the
change is simple enough, sure, let's take it, why not. But I don't see
the point still.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 498 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-05-15 18:35 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11  1:17 [Qemu-devel] [PATCH for-2.10 v9 00/13] add blkdebug tests Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 01/13] qcow2: Unallocate unmapped zero clusters if no backing file Eric Blake
2017-04-12  9:49   ` Kevin Wolf
2017-04-12 13:32     ` Eric Blake
2017-04-21 18:42     ` Eric Blake
2017-05-11 14:56     ` Eric Blake
2017-05-11 15:18       ` Eric Blake
2017-05-12 16:06       ` Max Reitz
2017-05-12 23:00         ` John Snow
2017-05-15 18:35           ` Max Reitz
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 02/13] iotests: Add test 179 to cover write zeroes with unmap Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 03/13] qemu-io: Switch 'alloc' command to byte-based length Eric Blake
2017-04-11  2:37   ` Philippe Mathieu-Daudé
2017-04-11 12:11     ` Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 04/13] qemu-io: Switch 'map' output to byte-based reporting Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 05/13] qcow2: Optimize write zero of unaligned tail cluster Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 06/13] qcow2: Assert that cluster operations are aligned Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 07/13] qcow2: Discard/zero clusters by byte count Eric Blake
2017-04-11 22:12   ` Eric Blake
2017-04-11 22:15   ` [Qemu-devel] [PATCH v9.5 07/13] fixup! " Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 08/13] blkdebug: Sanity check block layer guarantees Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 09/13] blkdebug: Refactor error injection Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 10/13] blkdebug: Add pass-through write_zero and discard support Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 11/13] blkdebug: Simplify override logic Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 12/13] blkdebug: Add ability to override unmap geometries Eric Blake
2017-04-11  1:17 ` [Qemu-devel] [PATCH v9 13/13] tests: Add coverage for recent block geometry fixes Eric Blake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.