[Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements
@ 2016-05-26  3:48 Eric Blake
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always Eric Blake
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Eric Blake @ 2016-05-26  3:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf

This series improves write_zeroes for qcow2

Since the work conflicts with my proposed patches to switch
write_zeroes to a byte-base interface, I figured I'd fix the
bugs and get this part nailed first, then rebase my other
work on top, rather than making Denis have to do the dirty work.

Changes from v2:
- patch 1: close to a rewrite, same concept but in fewer lines
of code, and with testsuite change to back it up as valid; but
keep authorship
- patch 2, 3: unchanged
- patch 4 from original series already applied
- patch 4 in this series is new
- patch 5: rewrite of the original v5 that catches even more
cases; claim authorship

[hmm, maybe I should treat 1 and 5 the same on whether to leave
authorship unchanged or just credit Denis for the original idea]

Denis V. Lunev (3):
  block: split write_zeroes always
  qcow2: simplify logic in qcow2_co_write_zeroes
  qcow2: add tracepoints for qcow2_co_write_zeroes

Eric Blake (2):
  qemu-iotests: Test one more spot for optimizing write_zeroes
  qcow2: Catch more unaligned write_zero into zero cluster

 block/io.c                 | 30 ++++++++++++---------
 block/qcow2.c              | 67 ++++++++++++++++++++--------------------------
 tests/qemu-iotests/154     | 40 +++++++++++++++++++++++++++
 tests/qemu-iotests/154.out | 55 ++++++++++++++++++++++++++++++++-----
 trace-events               |  2 ++
 5 files changed, 137 insertions(+), 57 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always
  2016-05-26  3:48 [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Eric Blake
@ 2016-05-26  3:48 ` Eric Blake
  2016-05-26  8:51   ` Fam Zheng
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 2/5] qcow2: simplify logic in qcow2_co_write_zeroes Eric Blake
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Eric Blake @ 2016-05-26  3:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, Stefan Hajnoczi, Fam Zheng, Max Reitz

From: "Denis V. Lunev" <den@openvz.org>

We should split requests even if they are less than write_zeroes_alignment.
For example we can have the following request:
  offset 62k
  size   4k
  write_zeroes_alignment 64k
The original code sent 1 request covering 2 qcow2 clusters, and resulted
in both clusters being allocated. But by splitting the request, we can
cater to the case where one of the two clusters can be zeroed as a
whole, for only 1 cluster allocated after the operation.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1463476543-3087-2-git-send-email-den@openvz.org>

[eblake: Avoid exceeding nb_sectors, hoist alignment checks out of
loop, and update testsuite to show that patch works]

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/io.c                 | 30 +++++++++++++++++-------------
 tests/qemu-iotests/154.out | 18 ++++++++++++------
 2 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/block/io.c b/block/io.c
index f474b9a..26b5845 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1118,28 +1118,32 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     struct iovec iov = {0};
     int ret = 0;
     bool need_flush = false;
+    int head = 0;
+    int tail = 0;

     int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes,
                                         BDRV_REQUEST_MAX_SECTORS);
+    if (bs->bl.write_zeroes_alignment) {
+        assert(is_power_of_2(bs->bl.write_zeroes_alignment));
+        head = sector_num & (bs->bl.write_zeroes_alignment - 1);
+        tail = (sector_num + nb_sectors) & (bs->bl.write_zeroes_alignment - 1);
+        max_write_zeroes &= ~(bs->bl.write_zeroes_alignment - 1);
+    }

     while (nb_sectors > 0 && !ret) {
         int num = nb_sectors;

         /* Align request.  Block drivers can expect the "bulk" of the request
-         * to be aligned.
+         * to be aligned, and that unaligned requests do not cross cluster
+         * boundaries.
          */
-        if (bs->bl.write_zeroes_alignment
-            && num > bs->bl.write_zeroes_alignment) {
-            if (sector_num % bs->bl.write_zeroes_alignment != 0) {
-                /* Make a small request up to the first aligned sector.  */
-                num = bs->bl.write_zeroes_alignment;
-                num -= sector_num % bs->bl.write_zeroes_alignment;
-            } else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 0) {
-                /* Shorten the request to the last aligned sector.  num cannot
-                 * underflow because num > bs->bl.write_zeroes_alignment.
-                 */
-                num -= (sector_num + num) % bs->bl.write_zeroes_alignment;
-            }
+        if (head) {
+            /* Make a small request up to the first aligned sector.  */
+            num = MIN(nb_sectors, bs->bl.write_zeroes_alignment - head);
+            head = 0;
+        } else if (tail && num > bs->bl.write_zeroes_alignment) {
+            /* Shorten the request to the last aligned sector.  */
+            num -= tail;
         }

         /* limit request size */
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index 8946b73..b9d27c5 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -106,11 +106,14 @@ read 1024/1024 bytes at offset 67584
 read 5120/5120 bytes at offset 68608
 5 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
-{ "start": 32768, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 36864, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 40960, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 49152, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 28672},
+{ "start": 49152, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 53248, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 57344, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 65536, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 36864},
+{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
+{ "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 73728, "length": 134144000, "depth": 1, "zero": true, "data": false}]

 == spanning two clusters, non-zero after request ==
@@ -145,11 +148,14 @@ read 7168/7168 bytes at offset 65536
 read 1024/1024 bytes at offset 72704
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
-{ "start": 32768, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 32768, "length": 4096, "depth": 0, "zero": true, "data": false},
+{ "start": 36864, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
 { "start": 40960, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 49152, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 28672},
+{ "start": 49152, "length": 4096, "depth": 0, "zero": true, "data": false},
+{ "start": 53248, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
 { "start": 57344, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 65536, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 36864},
+{ "start": 65536, "length": 4096, "depth": 0, "zero": true, "data": false},
+{ "start": 69632, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
 { "start": 73728, "length": 134144000, "depth": 1, "zero": true, "data": false}]

 == spanning two clusters, partially overwriting backing file ==
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 2/5] qcow2: simplify logic in qcow2_co_write_zeroes
  2016-05-26  3:48 [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Eric Blake
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always Eric Blake
@ 2016-05-26  3:48 ` Eric Blake
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 3/5] qcow2: add tracepoints for qcow2_co_write_zeroes Eric Blake
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Eric Blake @ 2016-05-26  3:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, Max Reitz

From: "Denis V. Lunev" <den@openvz.org>

Unaligned requests will occupy only one cluster. This is true since the
previous commit. Simplify the code taking this consideration into
account.

In other words, the caller is now buggy if it ever passes us an unaligned
request that crosses cluster boundaries (the only requests that can cross
boundaries will be aligned).

There are no other changes so far.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1463476543-3087-3-git-send-email-den@openvz.org>
---
 block/qcow2.c | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c9306a7..2f73201 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2438,33 +2438,20 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
     int tail = (sector_num + nb_sectors) % s->cluster_sectors;

     if (head != 0 || tail != 0) {
-        int64_t cl_end = -1;
+        int64_t cl_start = sector_num - head;

-        sector_num -= head;
-        nb_sectors += head;
+        assert(cl_start + s->cluster_sectors >= sector_num + nb_sectors);

-        if (tail != 0) {
-            nb_sectors += s->cluster_sectors - tail;
-        }
+        sector_num = cl_start;
+        nb_sectors = s->cluster_sectors;

         if (!is_zero_cluster(bs, sector_num)) {
             return -ENOTSUP;
         }

-        if (nb_sectors > s->cluster_sectors) {
-            /* Technically the request can cover 2 clusters, f.e. 4k write
-               at s->cluster_sectors - 2k offset. One of these cluster can
-               be zeroed, one unallocated */
-            cl_end = sector_num + nb_sectors - s->cluster_sectors;
-            if (!is_zero_cluster(bs, cl_end)) {
-                return -ENOTSUP;
-            }
-        }
-
         qemu_co_mutex_lock(&s->lock);
         /* We can have new write after previous check */
-        if (!is_zero_cluster_top_locked(bs, sector_num) ||
-                (cl_end > 0 && !is_zero_cluster_top_locked(bs, cl_end))) {
+        if (!is_zero_cluster_top_locked(bs, sector_num)) {
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 3/5] qcow2: add tracepoints for qcow2_co_write_zeroes
  2016-05-26  3:48 [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Eric Blake
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always Eric Blake
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 2/5] qcow2: simplify logic in qcow2_co_write_zeroes Eric Blake
@ 2016-05-26  3:48 ` Eric Blake
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 4/5] qemu-iotests: Test one more spot for optimizing write_zeroes Eric Blake
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Eric Blake @ 2016-05-26  3:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, Max Reitz

From: "Denis V. Lunev" <den@openvz.org>

This patch follows guidelines of all other tracepoints in qcow2, like ones
in qcow2_co_writev. I think that they should dump values in the same
quantities or be changed all together.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Message-Id: <1463476543-3087-4-git-send-email-den@openvz.org>
[eblake: typo fix in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c | 5 +++++
 trace-events  | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2f73201..105fd5e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2437,6 +2437,9 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
     int head = sector_num % s->cluster_sectors;
     int tail = (sector_num + nb_sectors) % s->cluster_sectors;

+    trace_qcow2_write_zeroes_start_req(qemu_coroutine_self(), sector_num,
+                                       nb_sectors);
+
     if (head != 0 || tail != 0) {
         int64_t cl_start = sector_num - head;

@@ -2459,6 +2462,8 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
         qemu_co_mutex_lock(&s->lock);
     }

+    trace_qcow2_write_zeroes(qemu_coroutine_self(), sector_num, nb_sectors);
+
     /* Whatever is left can use real zero clusters */
     ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
     qemu_co_mutex_unlock(&s->lock);
diff --git a/trace-events b/trace-events
index 4450d8f..46726cc 100644
--- a/trace-events
+++ b/trace-events
@@ -612,6 +612,8 @@ qcow2_writev_done_req(void *co, int ret) "co %p ret %d"
 qcow2_writev_start_part(void *co) "co %p"
 qcow2_writev_done_part(void *co, int cur_nr_sectors) "co %p cur_nr_sectors %d"
 qcow2_writev_data(void *co, uint64_t offset) "co %p offset %" PRIx64
+qcow2_write_zeroes_start_req(void *co, int64_t sector, int nb_sectors) "co %p sector %" PRIx64 " nb_sectors %d"
+qcow2_write_zeroes(void *co, int64_t sector, int nb_sectors) "co %p sector %" PRIx64 " nb_sectors %d"

 # block/qcow2-cluster.c
 qcow2_alloc_clusters_offset(void *co, uint64_t offset, int num) "co %p offset %" PRIx64 " num %d"
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 4/5] qemu-iotests: Test one more spot for optimizing write_zeroes
  2016-05-26  3:48 [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Eric Blake
                   ` (2 preceding siblings ...)
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 3/5] qcow2: add tracepoints for qcow2_co_write_zeroes Eric Blake
@ 2016-05-26  3:48 ` Eric Blake
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster Eric Blake
  2016-06-02 10:20 ` [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Kevin Wolf
  5 siblings, 0 replies; 13+ messages in thread
From: Eric Blake @ 2016-05-26  3:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, Max Reitz

Add another test to 154, showing that we currently allocate a
data cluster in the top layer if any sector of the backing file
was allocated.  The next patch will optimize this case.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/154     | 40 ++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/154.out | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
index 23f1b3a..5905c55 100755
--- a/tests/qemu-iotests/154
+++ b/tests/qemu-iotests/154
@@ -115,6 +115,46 @@ $QEMU_IO -c "read -P 0 40k 3k" "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map

 echo
+echo == write_zeroes covers non-zero data ==
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $size
+_make_test_img -b "$TEST_IMG.base"
+
+# non-zero data at front of request
+# Backing file: -- XX -- --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 5k 1k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 5k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 4k 4k" "$TEST_IMG" | _filter_qemu_io
+
+# non-zero data at end of request
+# Backing file: -- -- XX --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 14k 1k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 13k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 12k 4k" "$TEST_IMG" | _filter_qemu_io
+
+# non-zero data matches size of request
+# Backing file: -- XX XX --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 21k 2k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 21k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 20k 4k" "$TEST_IMG" | _filter_qemu_io
+
+# non-zero data smaller than request
+# Backing file: -- -X X- --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 30208 1k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 29k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 28k 4k" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+echo
 echo == spanning two clusters, non-zero before request ==

 CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $size
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index b9d27c5..531fd8c 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -74,6 +74,43 @@ read 3072/3072 bytes at offset 40960
 { "start": 40960, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
 { "start": 45056, "length": 134172672, "depth": 1, "zero": true, "data": false}]

+== write_zeroes covers non-zero data ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
+wrote 1024/1024 bytes at offset 5120
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 5120
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 4096
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 14336
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 13312
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 12288
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 21504
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 21504
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 20480
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 30208
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 29696
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 28672
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 4096, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 8192, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 12288, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 16384, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 20480, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
+{ "start": 24576, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 28672, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 32768},
+{ "start": 32768, "length": 134184960, "depth": 1, "zero": true, "data": false}]
+
 == spanning two clusters, non-zero before request ==
 Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/t.IMGFMT.base
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster
  2016-05-26  3:48 [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Eric Blake
                   ` (3 preceding siblings ...)
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 4/5] qemu-iotests: Test one more spot for optimizing write_zeroes Eric Blake
@ 2016-05-26  3:48 ` Eric Blake
  2016-05-26 13:41   ` Denis V. Lunev
  2016-05-26 14:56   ` Denis V. Lunev
  2016-06-02 10:20 ` [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Kevin Wolf
  5 siblings, 2 replies; 13+ messages in thread
From: Eric Blake @ 2016-05-26  3:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, Max Reitz

is_zero_cluster() and is_zero_cluster_top_locked() are used only
by qcow2_co_write_zeroes().  The former is too broad (we don't
care if the sectors we are about to overwrite are non-zero, only
that all other sectors in the cluster are zero), so it needs to
be called up to twice but with smaller limits - rename it along
with adding the neeeded parameter.  The latter can be inlined for
more compact code.

The testsuite change shows that we now have a sparser top file
when an unaligned write_zeroes overwrites the only portion of
the backing file with data.

Based on a patch proposal by Denis V. Lunev.

CC: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2.c              | 47 +++++++++++++++++++++++-----------------------
 tests/qemu-iotests/154.out |  8 ++++----
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 105fd5e..ecac399 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2406,26 +2406,19 @@ finish:
 }


-static bool is_zero_cluster(BlockDriverState *bs, int64_t start)
+static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
+                            uint32_t count)
 {
-    BDRVQcow2State *s = bs->opaque;
     int nr;
     BlockDriverState *file;
-    int64_t res = bdrv_get_block_status_above(bs, NULL, start,
-                                              s->cluster_sectors, &nr, &file);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == s->cluster_sectors;
-}
+    int64_t res;

-static bool is_zero_cluster_top_locked(BlockDriverState *bs, int64_t start)
-{
-    BDRVQcow2State *s = bs->opaque;
-    int nr = s->cluster_sectors;
-    uint64_t off;
-    int ret;
-
-    ret = qcow2_get_cluster_offset(bs, start << BDRV_SECTOR_BITS, &nr, &off);
-    assert(nr == s->cluster_sectors);
-    return ret == QCOW2_CLUSTER_UNALLOCATED || ret == QCOW2_CLUSTER_ZERO;
+    if (!count) {
+        return true;
+    }
+    res = bdrv_get_block_status_above(bs, NULL, start, count,
+                                      &nr, &file);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
 }

 static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
@@ -2434,27 +2427,33 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
     int ret;
     BDRVQcow2State *s = bs->opaque;

-    int head = sector_num % s->cluster_sectors;
-    int tail = (sector_num + nb_sectors) % s->cluster_sectors;
+    uint32_t head = sector_num % s->cluster_sectors;
+    uint32_t tail = (sector_num + nb_sectors) % s->cluster_sectors;

     trace_qcow2_write_zeroes_start_req(qemu_coroutine_self(), sector_num,
                                        nb_sectors);

-    if (head != 0 || tail != 0) {
+    if (head || tail) {
         int64_t cl_start = sector_num - head;
+        uint64_t off;
+        int nr;

         assert(cl_start + s->cluster_sectors >= sector_num + nb_sectors);

-        sector_num = cl_start;
-        nb_sectors = s->cluster_sectors;
-
-        if (!is_zero_cluster(bs, sector_num)) {
+        /* check whether remainder of cluster already reads as zero */
+        if (!(is_zero_sectors(bs, cl_start, head) &&
+              is_zero_sectors(bs, sector_num + nb_sectors,
+                              -tail & (s->cluster_sectors - 1)))) {
             return -ENOTSUP;
         }

         qemu_co_mutex_lock(&s->lock);
         /* We can have new write after previous check */
-        if (!is_zero_cluster_top_locked(bs, sector_num)) {
+        sector_num = cl_start;
+        nb_sectors = nr = s->cluster_sectors;
+        ret = qcow2_get_cluster_offset(bs, cl_start << BDRV_SECTOR_BITS,
+                                       &nr, &off);
+        if (ret != QCOW2_CLUSTER_UNALLOCATED && ret != QCOW2_CLUSTER_ZERO) {
             qemu_co_mutex_unlock(&s->lock);
             return -ENOTSUP;
         }
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index 531fd8c..da9eabd 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -102,13 +102,13 @@ wrote 2048/2048 bytes at offset 29696
 read 4096/4096 bytes at offset 28672
 4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 4096, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
+{ "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 8192, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 12288, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
+{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 16384, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 20480, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
+{ "start": 20480, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 24576, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 28672, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 32768},
+{ "start": 28672, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 32768, "length": 134184960, "depth": 1, "zero": true, "data": false}]

 == spanning two clusters, non-zero before request ==
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always Eric Blake
@ 2016-05-26  8:51   ` Fam Zheng
  0 siblings, 0 replies; 13+ messages in thread
From: Fam Zheng @ 2016-05-26  8:51 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, den, kwolf, Stefan Hajnoczi, Max Reitz

On Wed, 05/25 21:48, Eric Blake wrote:
> From: "Denis V. Lunev" <den@openvz.org>
> 
> We should split requests even if they are less than write_zeroes_alignment.
> For example we can have the following request:
>   offset 62k
>   size   4k
>   write_zeroes_alignment 64k
> The original code sent 1 request covering 2 qcow2 clusters, and resulted
> in both clusters being allocated. But by splitting the request, we can
> cater to the case where one of the two clusters can be zeroed as a
> whole, for only 1 cluster allocated after the operation.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Eric Blake <eblake@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> Message-Id: <1463476543-3087-2-git-send-email-den@openvz.org>
> 
> [eblake: Avoid exceeding nb_sectors, hoist alignment checks out of
> loop, and update testsuite to show that patch works]
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/io.c                 | 30 +++++++++++++++++-------------
>  tests/qemu-iotests/154.out | 18 ++++++++++++------
>  2 files changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index f474b9a..26b5845 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1118,28 +1118,32 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>      struct iovec iov = {0};
>      int ret = 0;
>      bool need_flush = false;
> +    int head = 0;
> +    int tail = 0;
> 
>      int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes,
>                                          BDRV_REQUEST_MAX_SECTORS);
> +    if (bs->bl.write_zeroes_alignment) {
> +        assert(is_power_of_2(bs->bl.write_zeroes_alignment));
> +        head = sector_num & (bs->bl.write_zeroes_alignment - 1);
> +        tail = (sector_num + nb_sectors) & (bs->bl.write_zeroes_alignment - 1);
> +        max_write_zeroes &= ~(bs->bl.write_zeroes_alignment - 1);
> +    }
> 
>      while (nb_sectors > 0 && !ret) {
>          int num = nb_sectors;
> 
>          /* Align request.  Block drivers can expect the "bulk" of the request
> -         * to be aligned.
> +         * to be aligned, and that unaligned requests do not cross cluster
> +         * boundaries.
>           */
> -        if (bs->bl.write_zeroes_alignment
> -            && num > bs->bl.write_zeroes_alignment) {
> -            if (sector_num % bs->bl.write_zeroes_alignment != 0) {
> -                /* Make a small request up to the first aligned sector.  */
> -                num = bs->bl.write_zeroes_alignment;
> -                num -= sector_num % bs->bl.write_zeroes_alignment;
> -            } else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 0) {
> -                /* Shorten the request to the last aligned sector.  num cannot
> -                 * underflow because num > bs->bl.write_zeroes_alignment.
> -                 */
> -                num -= (sector_num + num) % bs->bl.write_zeroes_alignment;
> -            }
> +        if (head) {
> +            /* Make a small request up to the first aligned sector.  */
> +            num = MIN(nb_sectors, bs->bl.write_zeroes_alignment - head);
> +            head = 0;
> +        } else if (tail && num > bs->bl.write_zeroes_alignment) {
> +            /* Shorten the request to the last aligned sector.  */
> +            num -= tail;
>          }
> 
>          /* limit request size */
> diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
> index 8946b73..b9d27c5 100644
> --- a/tests/qemu-iotests/154.out
> +++ b/tests/qemu-iotests/154.out
> @@ -106,11 +106,14 @@ read 1024/1024 bytes at offset 67584
>  read 5120/5120 bytes at offset 68608
>  5 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
> -{ "start": 32768, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 20480},
> +{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
> +{ "start": 36864, "length": 4096, "depth": 0, "zero": true, "data": false},
>  { "start": 40960, "length": 8192, "depth": 1, "zero": true, "data": false},
> -{ "start": 49152, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 28672},
> +{ "start": 49152, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
> +{ "start": 53248, "length": 4096, "depth": 0, "zero": true, "data": false},
>  { "start": 57344, "length": 8192, "depth": 1, "zero": true, "data": false},
> -{ "start": 65536, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 36864},
> +{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
> +{ "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
>  { "start": 73728, "length": 134144000, "depth": 1, "zero": true, "data": false}]
> 
>  == spanning two clusters, non-zero after request ==
> @@ -145,11 +148,14 @@ read 7168/7168 bytes at offset 65536
>  read 1024/1024 bytes at offset 72704
>  1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>  [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
> -{ "start": 32768, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 20480},
> +{ "start": 32768, "length": 4096, "depth": 0, "zero": true, "data": false},
> +{ "start": 36864, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
>  { "start": 40960, "length": 8192, "depth": 1, "zero": true, "data": false},
> -{ "start": 49152, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 28672},
> +{ "start": 49152, "length": 4096, "depth": 0, "zero": true, "data": false},
> +{ "start": 53248, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
>  { "start": 57344, "length": 8192, "depth": 1, "zero": true, "data": false},
> -{ "start": 65536, "length": 8192, "depth": 0, "zero": false, "data": true, "offset": 36864},
> +{ "start": 65536, "length": 4096, "depth": 0, "zero": true, "data": false},
> +{ "start": 69632, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
>  { "start": 73728, "length": 134144000, "depth": 1, "zero": true, "data": false}]
> 
>  == spanning two clusters, partially overwriting backing file ==
> -- 
> 2.5.5
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster Eric Blake
@ 2016-05-26 13:41   ` Denis V. Lunev
  2016-05-26 14:35     ` Eric Blake
  2016-05-26 14:56   ` Denis V. Lunev
  1 sibling, 1 reply; 13+ messages in thread
From: Denis V. Lunev @ 2016-05-26 13:41 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

On 05/26/2016 06:48 AM, Eric Blake wrote:
> is_zero_cluster() and is_zero_cluster_top_locked() are used only
> by qcow2_co_write_zeroes().  The former is too broad (we don't
> care if the sectors we are about to overwrite are non-zero, only
> that all other sectors in the cluster are zero), so it needs to
> be called up to twice but with smaller limits - rename it along
> with adding the neeeded parameter.  The latter can be inlined for
> more compact code.
>
> The testsuite change shows that we now have a sparser top file
> when an unaligned write_zeroes overwrites the only portion of
> the backing file with data.
>
> Based on a patch proposal by Denis V. Lunev.
>
> CC: Denis V. Lunev <den@openvz.org>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/qcow2.c              | 47 +++++++++++++++++++++++-----------------------
>   tests/qemu-iotests/154.out |  8 ++++----
>   2 files changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 105fd5e..ecac399 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2406,26 +2406,19 @@ finish:
>   }
>
>
> -static bool is_zero_cluster(BlockDriverState *bs, int64_t start)
> +static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
> +                            uint32_t count)
>   {
> -    BDRVQcow2State *s = bs->opaque;
>       int nr;
>       BlockDriverState *file;
> -    int64_t res = bdrv_get_block_status_above(bs, NULL, start,
> -                                              s->cluster_sectors, &nr, &file);
> -    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == s->cluster_sectors;
> -}
> +    int64_t res;
>
> -static bool is_zero_cluster_top_locked(BlockDriverState *bs, int64_t start)
> -{
> -    BDRVQcow2State *s = bs->opaque;
> -    int nr = s->cluster_sectors;
> -    uint64_t off;
> -    int ret;
> -
> -    ret = qcow2_get_cluster_offset(bs, start << BDRV_SECTOR_BITS, &nr, &off);
> -    assert(nr == s->cluster_sectors);
> -    return ret == QCOW2_CLUSTER_UNALLOCATED || ret == QCOW2_CLUSTER_ZERO;
> +    if (!count) {
> +        return true;
> +    }
> +    res = bdrv_get_block_status_above(bs, NULL, start, count,
> +                                      &nr, &file);
> +    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
>   }
>
>   static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
> @@ -2434,27 +2427,33 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
>       int ret;
>       BDRVQcow2State *s = bs->opaque;
>
> -    int head = sector_num % s->cluster_sectors;
> -    int tail = (sector_num + nb_sectors) % s->cluster_sectors;
> +    uint32_t head = sector_num % s->cluster_sectors;
> +    uint32_t tail = (sector_num + nb_sectors) % s->cluster_sectors;
>
>       trace_qcow2_write_zeroes_start_req(qemu_coroutine_self(), sector_num,
>                                          nb_sectors);
>
> -    if (head != 0 || tail != 0) {
> +    if (head || tail) {
>           int64_t cl_start = sector_num - head;
> +        uint64_t off;
> +        int nr;
>
>           assert(cl_start + s->cluster_sectors >= sector_num + nb_sectors);
>
> -        sector_num = cl_start;
> -        nb_sectors = s->cluster_sectors;
> -
> -        if (!is_zero_cluster(bs, sector_num)) {
> +        /* check whether remainder of cluster already reads as zero */
> +        if (!(is_zero_sectors(bs, cl_start, head) &&
> +              is_zero_sectors(bs, sector_num + nb_sectors,
> +                              -tail & (s->cluster_sectors - 1)))) {

can we have cluster_sectors != 2^n?
In this case this bits logic will be broken.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster
  2016-05-26 13:41   ` Denis V. Lunev
@ 2016-05-26 14:35     ` Eric Blake
  2016-06-02 10:14       ` Kevin Wolf
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Blake @ 2016-05-26 14:35 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2305 bytes --]

On 05/26/2016 07:41 AM, Denis V. Lunev wrote:
> On 05/26/2016 06:48 AM, Eric Blake wrote:
>> is_zero_cluster() and is_zero_cluster_top_locked() are used only
>> by qcow2_co_write_zeroes().  The former is too broad (we don't
>> care if the sectors we are about to overwrite are non-zero, only
>> that all other sectors in the cluster are zero), so it needs to
>> be called up to twice but with smaller limits - rename it along
>> with adding the neeeded parameter.  The latter can be inlined for
>> more compact code.
>>
>> The testsuite change shows that we now have a sparser top file
>> when an unaligned write_zeroes overwrites the only portion of
>> the backing file with data.
>>
>> Based on a patch proposal by Denis V. Lunev.
>>

>> -
>> -        if (!is_zero_cluster(bs, sector_num)) {
>> +        /* check whether remainder of cluster already reads as zero */
>> +        if (!(is_zero_sectors(bs, cl_start, head) &&
>> +              is_zero_sectors(bs, sector_num + nb_sectors,
>> +                              -tail & (s->cluster_sectors - 1)))) {
> 
> can we have cluster_sectors != 2^n?

No. cluster_sectors is an inherent property of the qcow2 file format:


         20 - 23:   cluster_bits
                    Number of bits that are used for addressing an offset
                    within a cluster (1 << cluster_bits is the cluster
size).
                    Must not be less than 9 (i.e. 512 byte clusters).


As the file format uses a bit shift value, you are guaranteed to have a
power of two amount of sectors within a cluster.

If you prefer, I could have written '-tail % s->cluster_sectors', but as
% on a negative signed integer gives different results than what you get
for an unsigned number, I felt that & was nicer than % for making it
more obvious that I'm grabbing particular bits.

If you can think of any cleaner expression that represents the number of
sectors occurring after the tail until the next cluster boundary, I'm
game; the hardest part is that when tail is 0, we want the number passed
to is_zero_sectors() to also be 0, not s->cluster_sectors (so the naive
's->cluster_sectors - tail' is wrong).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster Eric Blake
  2016-05-26 13:41   ` Denis V. Lunev
@ 2016-05-26 14:56   ` Denis V. Lunev
  1 sibling, 0 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-05-26 14:56 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, kwolf, Max Reitz

On 05/26/2016 06:48 AM, Eric Blake wrote:
> is_zero_cluster() and is_zero_cluster_top_locked() are used only
> by qcow2_co_write_zeroes().  The former is too broad (we don't
> care if the sectors we are about to overwrite are non-zero, only
> that all other sectors in the cluster are zero), so it needs to
> be called up to twice but with smaller limits - rename it along
> with adding the neeeded parameter.  The latter can be inlined for
> more compact code.
>
> The testsuite change shows that we now have a sparser top file
> when an unaligned write_zeroes overwrites the only portion of
> the backing file with data.
>
> Based on a patch proposal by Denis V. Lunev.
>
> CC: Denis V. Lunev <den@openvz.org>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/qcow2.c              | 47 +++++++++++++++++++++++-----------------------
>   tests/qemu-iotests/154.out |  8 ++++----
>   2 files changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 105fd5e..ecac399 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2406,26 +2406,19 @@ finish:
>   }
>
>
> -static bool is_zero_cluster(BlockDriverState *bs, int64_t start)
> +static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
> +                            uint32_t count)
>   {
> -    BDRVQcow2State *s = bs->opaque;
>       int nr;
>       BlockDriverState *file;
> -    int64_t res = bdrv_get_block_status_above(bs, NULL, start,
> -                                              s->cluster_sectors, &nr, &file);
> -    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == s->cluster_sectors;
> -}
> +    int64_t res;
>
> -static bool is_zero_cluster_top_locked(BlockDriverState *bs, int64_t start)
> -{
> -    BDRVQcow2State *s = bs->opaque;
> -    int nr = s->cluster_sectors;
> -    uint64_t off;
> -    int ret;
> -
> -    ret = qcow2_get_cluster_offset(bs, start << BDRV_SECTOR_BITS, &nr, &off);
> -    assert(nr == s->cluster_sectors);
> -    return ret == QCOW2_CLUSTER_UNALLOCATED || ret == QCOW2_CLUSTER_ZERO;
> +    if (!count) {
> +        return true;
> +    }
> +    res = bdrv_get_block_status_above(bs, NULL, start, count,
> +                                      &nr, &file);
> +    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
>   }
>
>   static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
> @@ -2434,27 +2427,33 @@ static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
>       int ret;
>       BDRVQcow2State *s = bs->opaque;
>
> -    int head = sector_num % s->cluster_sectors;
> -    int tail = (sector_num + nb_sectors) % s->cluster_sectors;
> +    uint32_t head = sector_num % s->cluster_sectors;
> +    uint32_t tail = (sector_num + nb_sectors) % s->cluster_sectors;
>
>       trace_qcow2_write_zeroes_start_req(qemu_coroutine_self(), sector_num,
>                                          nb_sectors);
>
> -    if (head != 0 || tail != 0) {
> +    if (head || tail) {
>           int64_t cl_start = sector_num - head;
> +        uint64_t off;
> +        int nr;
>
>           assert(cl_start + s->cluster_sectors >= sector_num + nb_sectors);
>
> -        sector_num = cl_start;
> -        nb_sectors = s->cluster_sectors;
> -
> -        if (!is_zero_cluster(bs, sector_num)) {
> +        /* check whether remainder of cluster already reads as zero */
> +        if (!(is_zero_sectors(bs, cl_start, head) &&
> +              is_zero_sectors(bs, sector_num + nb_sectors,
> +                              -tail & (s->cluster_sectors - 1)))) {
>               return -ENOTSUP;
>           }
>
>           qemu_co_mutex_lock(&s->lock);
>           /* We can have new write after previous check */
> -        if (!is_zero_cluster_top_locked(bs, sector_num)) {
> +        sector_num = cl_start;
> +        nb_sectors = nr = s->cluster_sectors;
> +        ret = qcow2_get_cluster_offset(bs, cl_start << BDRV_SECTOR_BITS,
> +                                       &nr, &off);
> +        if (ret != QCOW2_CLUSTER_UNALLOCATED && ret != QCOW2_CLUSTER_ZERO) {
>               qemu_co_mutex_unlock(&s->lock);
>               return -ENOTSUP;
>           }
> diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
> index 531fd8c..da9eabd 100644
> --- a/tests/qemu-iotests/154.out
> +++ b/tests/qemu-iotests/154.out
> @@ -102,13 +102,13 @@ wrote 2048/2048 bytes at offset 29696
>   read 4096/4096 bytes at offset 28672
>   4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>   [{ "start": 0, "length": 4096, "depth": 1, "zero": true, "data": false},
> -{ "start": 4096, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 20480},
> +{ "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
>   { "start": 8192, "length": 4096, "depth": 1, "zero": true, "data": false},
> -{ "start": 12288, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 24576},
> +{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
>   { "start": 16384, "length": 4096, "depth": 1, "zero": true, "data": false},
> -{ "start": 20480, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 28672},
> +{ "start": 20480, "length": 4096, "depth": 0, "zero": true, "data": false},
>   { "start": 24576, "length": 4096, "depth": 1, "zero": true, "data": false},
> -{ "start": 28672, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 32768},
> +{ "start": 28672, "length": 4096, "depth": 0, "zero": true, "data": false},
>   { "start": 32768, "length": 134184960, "depth": 1, "zero": true, "data": false}]
>
>   == spanning two clusters, non-zero before request ==
Reviewed-by: Denis V. Lunev <den@openvz.org>

Thank you very much for pushing this!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster
  2016-05-26 14:35     ` Eric Blake
@ 2016-06-02 10:14       ` Kevin Wolf
  2016-06-02 12:33         ` Eric Blake
  0 siblings, 1 reply; 13+ messages in thread
From: Kevin Wolf @ 2016-06-02 10:14 UTC (permalink / raw)
  To: Eric Blake; +Cc: Denis V. Lunev, qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2682 bytes --]

Am 26.05.2016 um 16:35 hat Eric Blake geschrieben:
> On 05/26/2016 07:41 AM, Denis V. Lunev wrote:
> > On 05/26/2016 06:48 AM, Eric Blake wrote:
> >> is_zero_cluster() and is_zero_cluster_top_locked() are used only
> >> by qcow2_co_write_zeroes().  The former is too broad (we don't
> >> care if the sectors we are about to overwrite are non-zero, only
> >> that all other sectors in the cluster are zero), so it needs to
> >> be called up to twice but with smaller limits - rename it along
> >> with adding the neeeded parameter.  The latter can be inlined for
> >> more compact code.
> >>
> >> The testsuite change shows that we now have a sparser top file
> >> when an unaligned write_zeroes overwrites the only portion of
> >> the backing file with data.
> >>
> >> Based on a patch proposal by Denis V. Lunev.
> >>
> 
> >> -
> >> -        if (!is_zero_cluster(bs, sector_num)) {
> >> +        /* check whether remainder of cluster already reads as zero */
> >> +        if (!(is_zero_sectors(bs, cl_start, head) &&
> >> +              is_zero_sectors(bs, sector_num + nb_sectors,
> >> +                              -tail & (s->cluster_sectors - 1)))) {
> > 
> > can we have cluster_sectors != 2^n?
> 
> No. cluster_sectors is an inherent property of the qcow2 file format:
> 
> 
>          20 - 23:   cluster_bits
>                     Number of bits that are used for addressing an offset
>                     within a cluster (1 << cluster_bits is the cluster
> size).
>                     Must not be less than 9 (i.e. 512 byte clusters).
> 
> 
> As the file format uses a bit shift value, you are guaranteed to have a
> power of two amount of sectors within a cluster.
> 
> If you prefer, I could have written '-tail % s->cluster_sectors', but as
> % on a negative signed integer gives different results than what you get
> for an unsigned number, I felt that & was nicer than % for making it
> more obvious that I'm grabbing particular bits.
> 
> If you can think of any cleaner expression that represents the number of
> sectors occurring after the tail until the next cluster boundary, I'm
> game; the hardest part is that when tail is 0, we want the number passed
> to is_zero_sectors() to also be 0, not s->cluster_sectors (so the naive
> 's->cluster_sectors - tail' is wrong).

The obvious one would be translating your English into C:

    tail ? s->cluster_sectors - tail : 0

Another option which doesn't require an unsigned type would be
(s->cluster_sectors - tail) % s->cluster_sectors.

I'm okay with merging the "more interesting" version, though I must
admit that I had to read it twice.

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements
  2016-05-26  3:48 [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Eric Blake
                   ` (4 preceding siblings ...)
  2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster Eric Blake
@ 2016-06-02 10:20 ` Kevin Wolf
  5 siblings, 0 replies; 13+ messages in thread
From: Kevin Wolf @ 2016-06-02 10:20 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, den

Am 26.05.2016 um 05:48 hat Eric Blake geschrieben:
> This series improves write_zeroes for qcow2
> 
> Since the work conflicts with my proposed patches to switch
> write_zeroes to a byte-base interface, I figured I'd fix the
> bugs and get this part nailed first, then rebase my other
> work on top, rather than making Denis have to do the dirty work.

Thanks, applied to the block branch.

Kevin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster
  2016-06-02 10:14       ` Kevin Wolf
@ 2016-06-02 12:33         ` Eric Blake
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Blake @ 2016-06-02 12:33 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Denis V. Lunev, qemu-devel, qemu-block, Max Reitz, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]

On 06/02/2016 04:14 AM, Kevin Wolf wrote:

>> If you prefer, I could have written '-tail % s->cluster_sectors', but as
>> % on a negative signed integer gives different results than what you get
>> for an unsigned number, I felt that & was nicer than % for making it
>> more obvious that I'm grabbing particular bits.
>>
>> If you can think of any cleaner expression that represents the number of
>> sectors occurring after the tail until the next cluster boundary, I'm
>> game; the hardest part is that when tail is 0, we want the number passed
>> to is_zero_sectors() to also be 0, not s->cluster_sectors (so the naive
>> 's->cluster_sectors - tail' is wrong).
> 
> The obvious one would be translating your English into C:
> 
>     tail ? s->cluster_sectors - tail : 0

Would gcc optimize this into a bit operation rather than a branch?  If
not, that's a missed optimization bug that we should probably report
(that is, if gcc has enough information elsewhere that
s->cluster_sectors is a power of 2, since the bit operation optimization
depends on that fact).

> 
> Another option which doesn't require an unsigned type would be
> (s->cluster_sectors - tail) % s->cluster_sectors.

That's the same thing as '-tail % s->cluster_sectors', since the
distributive rule applies to modulo arithmetic, and since 'a % a' is 0
for non-zero 'a'.  So you can argue I was just saving typing :)

> 
> I'm okay with merging the "more interesting" version, though I must
> admit that I had to read it twice.

That's certainly your call as maintainer :)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-06-02 12:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26  3:48 [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Eric Blake
2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always Eric Blake
2016-05-26  8:51   ` Fam Zheng
2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 2/5] qcow2: simplify logic in qcow2_co_write_zeroes Eric Blake
2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 3/5] qcow2: add tracepoints for qcow2_co_write_zeroes Eric Blake
2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 4/5] qemu-iotests: Test one more spot for optimizing write_zeroes Eric Blake
2016-05-26  3:48 ` [Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster Eric Blake
2016-05-26 13:41   ` Denis V. Lunev
2016-05-26 14:35     ` Eric Blake
2016-06-02 10:14       ` Kevin Wolf
2016-06-02 12:33         ` Eric Blake
2016-05-26 14:56   ` Denis V. Lunev
2016-06-02 10:20 ` [Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.