All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements
@ 2017-08-01 14:18 Anton Nefedov
  2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 01/15] mirror: inherit supported write/zero flags Anton Nefedov
                   ` (15 more replies)
  0 siblings, 16 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

Changes in v4:

  - rebased on top of Eric's series
    [PATCH v2 00/20] add byte-based block_status driver callbacks
    http://lists.nongnu.org/archive/html/qemu-devel/2017-07/msg04370.html

  - patch 5 fixed to compile without CONFIG_FALLOCATE and support
    BDRV_REQ_ALLOCATE with xfsctl enabled

  - add patches 1, 2 to support write/zero flags in mirror and blkverify
    drivers

========

Here goes a revisited series on qcow2 preallocation. It's probably a bit better
integrated this time and the amount of code is reduced significantly.

Changes in v3:
  - requests intersection detection from the previous versions is removed
    from qcow2 driver. Instead, tracked request infrastructure from the common
    block layer is used.

  - that made possible to omit preallocation for metadata writes.
    Those are rare and won't affect performance.

  - 'Simultaneous writes' feature (patches v2 12-15) dropped by now;
     Might worth a separate series.

  - various remarks to the previous version fixed

  - some iotests added

========

Changes in v2:
  - introduce new BDRV flag for write_zeroes()
      instead of using driver callback directly.
    Skipped introducing new functions like bdrv_co_pallocate() for now:
      1. it seems ok to keep calling this write_zeroes() as zeroes
      are expected;
      2. most of the code can be reused now anyway, so changes to
      write_zeroes() path are not significant
      3. write_zeroes() alignment and max-request limits can also be reused

    As a possible alternative we can have bdrv_co_pallocate() which can
    switch to pwrite_zeroes(,flags|=BDRV_REQ_ALLOCATE) early.

========

This pull request is to address a few performance problems of qcow2 format:

  1. non cluster-aligned write requests (to unallocated clusters) explicitly
    pad data with zeroes if there is no backing data. This can be avoided
    and the whole clusters are preallocated and zeroed in a single
    efficient write_zeroes() operation, also providing better host file
    continuity

  2. moreover, efficient write_zeroes() operation can be used to preallocate
    space megabytes ahead which gives noticeable improvement on some storage
    types (e.g. distributed storages where space allocation operation is
    expensive)

  /* omitted in v3: */
  3. preallocating/zeroing the clusters in advance makes possible to enable
    simultaneous writes to the same unallocated cluster, which is beneficial
    for parallel sequential write operations which are not cluster-aligned

Performance test results are added to commit messages (see patch 3, 12)

Anton Nefedov (12):
  mirror: inherit supported write/zero flags
  blkverify: set supported write/zero flags
  block: introduce BDRV_REQ_ALLOCATE flag
  block: treat BDRV_REQ_ALLOCATE as serialising
  file-posix: support BDRV_REQ_ALLOCATE
  block: support BDRV_REQ_ALLOCATE in passthrough drivers
  qcow2: set inactive flag
  qcow2: move is_zero() up
  qcow2: skip writing zero buffers to empty COW areas
  qcow2: allocate image space by-cluster
  iotest 190: test BDRV_REQ_ALLOCATE
  iotest 134: test cluster-misaligned encrypted write

Denis V. Lunev (2):
  qcow2: preallocation at image expand
  qcow2: truncate preallocated space

Pavel Butsykin (1):
  qcow2: check space leak at the end of the image

 include/block/block.h              |   6 +-
 include/block/block_int.h          |   2 +-
 block/qcow2.h                      |  18 ++++
 block/blkdebug.c                   |   3 +-
 block/blkverify.c                  |   9 ++
 block/file-posix.c                 |   8 ++
 block/io.c                         |  47 +++++++--
 block/mirror.c                     |   5 +
 block/qcow2-cluster.c              |  14 ++-
 block/qcow2-refcount.c             |   7 ++
 block/qcow2.c                      | 203 ++++++++++++++++++++++++++++++++-----
 block/raw-format.c                 |   3 +-
 block/trace-events                 |   1 +
 qemu-options.hx                    |   4 +
 tests/qemu-iotests/026.out         | 104 ++++++++++++++-----
 tests/qemu-iotests/026.out.nocache | 104 ++++++++++++++-----
 tests/qemu-iotests/029.out         |   5 +-
 tests/qemu-iotests/060             |   2 +-
 tests/qemu-iotests/060.out         |  13 ++-
 tests/qemu-iotests/061.out         |   5 +-
 tests/qemu-iotests/066             |   2 +-
 tests/qemu-iotests/066.out         |   9 +-
 tests/qemu-iotests/098.out         |   7 +-
 tests/qemu-iotests/108.out         |   5 +-
 tests/qemu-iotests/112.out         |   5 +-
 tests/qemu-iotests/134             |   9 ++
 tests/qemu-iotests/134.out         |  10 ++
 tests/qemu-iotests/190             | 146 ++++++++++++++++++++++++++
 tests/qemu-iotests/190.out         |  50 +++++++++
 tests/qemu-iotests/group           |   1 +
 30 files changed, 705 insertions(+), 102 deletions(-)
 create mode 100755 tests/qemu-iotests/190
 create mode 100644 tests/qemu-iotests/190.out

-- 
2.7.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 01/15] mirror: inherit supported write/zero flags
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
@ 2017-08-01 14:18 ` Anton Nefedov
  2017-08-04 15:13   ` [Qemu-devel] [PATCH v4 for-2.10 " Eric Blake
  2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 02/15] blkverify: set " Anton Nefedov
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/mirror.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index d46dace..7e539f1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1056,6 +1056,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
     bdrv_refresh_filename(bs->backing->bs);
     pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
             bs->backing->bs->filename);
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->backing->bs->supported_write_flags;
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->backing->bs->supported_zero_flags;
 }
 
 static void bdrv_mirror_top_close(BlockDriverState *bs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 02/15] blkverify: set supported write/zero flags
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
  2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 01/15] mirror: inherit supported write/zero flags Anton Nefedov
@ 2017-08-01 14:18 ` Anton Nefedov
  2017-08-04 15:23   ` Eric Blake
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/blkverify.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 06369f9..9ba65d0 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -140,6 +140,15 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags &
+        s->test_file->bs->supported_write_flags;
+
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags &
+        s->test_file->bs->supported_zero_flags;
+
     ret = 0;
 fail:
     qemu_opts_del(opts);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
  2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 01/15] mirror: inherit supported write/zero flags Anton Nefedov
  2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 02/15] blkverify: set " Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-04 19:51   ` Eric Blake
  2017-08-29 12:47   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 04/15] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

The flag is supposed to indicate that the region of the disk image has
to be sufficiently allocated so it reads as zeroes. The call with the flag
set has to return -ENOTSUP if allocation cannot be done efficiently
(i.e. without falling back to writing actual buffers)

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 include/block/block.h     |  6 +++++-
 include/block/block_int.h |  2 +-
 block/io.c                | 20 +++++++++++++++++---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 7fe0125..828da67 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -65,9 +65,13 @@ typedef enum {
     BDRV_REQ_NO_SERIALISING     = 0x8,
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
+    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
+     * efficiently allocate the space so it reads as zeroes, or return an error.
+     */
+    BDRV_REQ_ALLOCATE           = 0x40,
 
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x3f,
+    BDRV_REQ_MASK               = 0x7f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9b94b32..9b64411 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -585,7 +585,7 @@ struct BlockDriverState {
     /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
     unsigned int supported_write_flags;
     /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
-     * BDRV_REQ_MAY_UNMAP) */
+     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
     unsigned int supported_zero_flags;
 
     /* the following member gives a name to every node on the bs graph. */
diff --git a/block/io.c b/block/io.c
index 375fc66..04d495e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1245,7 +1245,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
             assert(!bs->supported_zero_flags);
         }
 
-        if (ret == -ENOTSUP) {
+        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
             /* Fall back to bounce buffer if write zeroes is unsupported */
             BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
 
@@ -1331,8 +1331,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
     if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
-        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
-        qemu_iovec_is_zero(qiov)) {
+        !(flags & BDRV_REQ_ZERO_WRITE) && !(flags & BDRV_REQ_ALLOCATE) &&
+        drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {
         flags |= BDRV_REQ_ZERO_WRITE;
         if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
             flags |= BDRV_REQ_MAY_UNMAP;
@@ -1410,6 +1410,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
     assert(flags & BDRV_REQ_ZERO_WRITE);
     if (head_padding_bytes || tail_padding_bytes) {
+        if (flags & BDRV_REQ_ALLOCATE) {
+            return -ENOTSUP;
+        }
         buf = qemu_blockalign(bs, align);
         iov = (struct iovec) {
             .iov_base   = buf,
@@ -1508,6 +1511,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         return ret;
     }
 
+    /* allocation request with qiov provided doesn't make much sense */
+    assert(!(qiov && flags & BDRV_REQ_ALLOCATE));
+
     bdrv_inc_in_flight(bs);
     /*
      * Align write if necessary by performing a read-modify-write cycle.
@@ -1639,6 +1645,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
 {
     trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
 
+    assert(!(flags & BDRV_REQ_MAY_UNMAP && flags & BDRV_REQ_ALLOCATE));
+
+    if (flags & BDRV_REQ_ALLOCATE &&
+        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
+    {
+        return -ENOTSUP;
+    }
+
     if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
         flags &= ~BDRV_REQ_MAY_UNMAP;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 04/15] block: treat BDRV_REQ_ALLOCATE as serialising
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (2 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 05/15] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

The idea is that ALLOCATE requests may overlap with other requests.
Reuse the existing block layer infrastructure for serialising requests.
Use the following approach:
  - mark ALLOCATE serialising, so subsequent requests to the area wait
  - ALLOCATE request itself must never wait if another request is in flight
    already. Return EAGAIN, let the caller reconsider.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/io.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index 04d495e..0a7a372 100644
--- a/block/io.c
+++ b/block/io.c
@@ -488,7 +488,8 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
     bdrv_wakeup(bs);
 }
 
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
+                                                   bool nowait)
 {
     BlockDriverState *bs = self->bs;
     BdrvTrackedRequest *req;
@@ -519,11 +520,14 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
                  * will wait for us as soon as it wakes up, then just go on
                  * (instead of producing a deadlock in the former case). */
                 if (!req->waiting_for) {
+                    waited = true;
+                    if (nowait) {
+                        break;
+                    }
                     self->waiting_for = req;
                     qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
                     self->waiting_for = NULL;
                     retry = true;
-                    waited = true;
                     break;
                 }
             }
@@ -1027,7 +1031,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
     }
 
     if (!(flags & BDRV_REQ_NO_SERIALISING)) {
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
     }
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1321,7 +1325,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
-    waited = wait_serialising_requests(req);
+    waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
+    if (waited && flags & BDRV_REQ_ALLOCATE) {
+        return -EAGAIN;
+    }
     assert(!waited || !req->serialising);
     assert(req->overlap_offset <= offset);
     assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
@@ -1425,7 +1432,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
         /* RMW the unaligned part before head. */
         mark_request_serialising(req, align);
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
         bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_HEAD);
         ret = bdrv_aligned_preadv(child, req, offset & ~(align - 1), align,
                                   align, &local_qiov, 0);
@@ -1445,6 +1452,10 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
         bytes -= zero_bytes;
     }
 
+    if (flags & BDRV_REQ_ALLOCATE) {
+        mark_request_serialising(req, align);
+    }
+
     assert(!bytes || (offset & (align - 1)) == 0);
     if (bytes >= align) {
         /* Write the aligned part in the middle. */
@@ -1463,7 +1474,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
         assert(align == tail_padding_bytes + bytes);
         /* RMW the unaligned part after tail. */
         mark_request_serialising(req, align);
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
         bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_TAIL);
         ret = bdrv_aligned_preadv(child, req, offset, align,
                                   align, &local_qiov, 0);
@@ -1532,7 +1543,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         struct iovec head_iov;
 
         mark_request_serialising(&req, align);
-        wait_serialising_requests(&req);
+        wait_serialising_requests(&req, false);
 
         head_buf = qemu_blockalign(bs, align);
         head_iov = (struct iovec) {
@@ -1573,7 +1584,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         bool waited;
 
         mark_request_serialising(&req, align);
-        waited = wait_serialising_requests(&req);
+        waited = wait_serialising_requests(&req, false);
         assert(!waited || !use_local_qiov);
 
         tail_buf = qemu_blockalign(bs, align);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 05/15] file-posix: support BDRV_REQ_ALLOCATE
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (3 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 04/15] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-29 13:00   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

Current write_zeroes implementation is good enough to satisfy this flag too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/file-posix.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 765a440..4ef1b1d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -539,7 +539,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
     if (S_ISREG(st.st_mode)) {
         s->discard_zeroes = true;
+#ifdef CONFIG_FALLOCATE
         s->has_fallocate = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
+#endif
     }
     if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -574,6 +577,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
         s->is_xfs = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
     }
 #endif
 
@@ -1388,6 +1392,10 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
         }
         s->has_fallocate = false;
     }
+
+    if (!s->has_fallocate) {
+        aiocb->bs->supported_zero_flags &= ~BDRV_REQ_ALLOCATE;
+    }
 #endif
 
     return -ENOTSUP;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (4 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 05/15] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-29 13:07   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 07/15] qcow2: preallocation at image expand Anton Nefedov
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

Support the flag if the underlying BDS supports it

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/blkdebug.c   | 3 ++-
 block/blkverify.c  | 2 +-
 block/mirror.c     | 2 +-
 block/raw-format.c | 3 ++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 234c8fb9..2ac3e12 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -414,7 +414,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
 
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
     ret = -EINVAL;
 
diff --git a/block/blkverify.c b/block/blkverify.c
index 9ba65d0..b249636 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -145,7 +145,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         s->test_file->bs->supported_write_flags;
 
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags &
         s->test_file->bs->supported_zero_flags;
 
diff --git a/block/mirror.c b/block/mirror.c
index 7e539f1..5510776 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1059,7 +1059,7 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->backing->bs->supported_write_flags;
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->backing->bs->supported_zero_flags;
 }
 
diff --git a/block/raw-format.c b/block/raw-format.c
index 22c7e98..434af74 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -417,7 +417,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
     bs->sg = bs->file->bs->sg;
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
 
     if (bs->probed && !bdrv_is_read_only(bs)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 07/15] qcow2: preallocation at image expand
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (5 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 08/15] qcow2: set inactive flag Anton Nefedov
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, den, kwolf, mreitz, eblake, Denis V. Lunev, Anton Nefedov

From: "Denis V. Lunev" <den@openvz.org>

This patch adds image preallocation at expand to provide better locality
of QCOW2 image file and optimize this procedure for some distributed
storage where this procedure is slow.

Preallocation is not issued upon writing metadata clusters.

Possible conflicts are resolved by the common block layer code since
ALLOCATE requests are serialising.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.h   |  3 +++
 block/qcow2.c   | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 qemu-options.hx |  4 ++++
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 96a8d43..ebbb9cf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -102,6 +102,7 @@
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
 #define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
+#define QCOW2_OPT_PREALLOC_SIZE "prealloc-size"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -327,6 +328,8 @@ typedef struct BDRVQcow2State {
      * override) */
     char *image_backing_file;
     char *image_backing_format;
+
+    uint64_t prealloc_size;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
diff --git a/block/qcow2.c b/block/qcow2.c
index bcdd212..66aa8c2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -674,6 +674,11 @@ static QemuOptsList qcow2_runtime_opts = {
         },
         BLOCK_CRYPTO_OPT_DEF_KEY_SECRET("encrypt.",
             "ID of secret providing qcow2 AES key or LUKS passphrase"),
+        {
+            .name = QCOW2_OPT_PREALLOC_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Preallocation amount at image expand",
+        },
         { /* end of list */ }
     },
 };
@@ -1016,6 +1021,15 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
         goto fail;
     }
 
+    s->prealloc_size =
+        ROUND_UP(qemu_opt_get_size_del(opts, QCOW2_OPT_PREALLOC_SIZE, 0),
+                 s->cluster_size);
+    if (s->prealloc_size &&
+        !(bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
+    {
+        s->prealloc_size = 0;
+    }
+
     ret = 0;
 fail:
     QDECREF(encryptopts);
@@ -1898,6 +1912,43 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
     return false;
 }
 
+/*
+ * If the specified area is beyond EOF, allocates it + prealloc_size
+ * bytes ahead.
+ */
+static void coroutine_fn handle_prealloc(BlockDriverState *bs,
+                                         const QCowL2Meta *m)
+{
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t start = m->alloc_offset;
+    uint64_t end = start + m->nb_clusters * s->cluster_size;
+    int64_t flen = bdrv_getlength(bs->file->bs);
+
+    if (flen < 0) {
+        return;
+    }
+
+    if (end > flen) {
+        /* try to alloc host space in one chunk for better locality */
+        bdrv_co_pwrite_zeroes(bs->file, flen,
+                              QEMU_ALIGN_UP(end + s->prealloc_size - flen,
+                                            s->cluster_size),
+                              BDRV_REQ_ALLOCATE);
+    }
+}
+
+static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCowL2Meta *m;
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        if (s->prealloc_size) {
+            handle_prealloc(bs, m);
+        }
+    }
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -1982,24 +2033,31 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
+        qemu_co_mutex_unlock(&s->lock);
+
+        if (bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE) {
+            handle_alloc_space(bs, l2meta);
+        }
+
         /* If we need to do COW, check if it's possible to merge the
          * writing of the guest data together with that of the COW regions.
          * If it's not possible (or not necessary) then write the
          * guest data now. */
         if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
-            qemu_co_mutex_unlock(&s->lock);
             BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
             trace_qcow2_writev_data(qemu_coroutine_self(),
                                     cluster_offset + offset_in_cluster);
             ret = bdrv_co_pwritev(bs->file,
                                   cluster_offset + offset_in_cluster,
                                   cur_bytes, &hd_qiov, 0);
-            qemu_co_mutex_lock(&s->lock);
             if (ret < 0) {
+                qemu_co_mutex_lock(&s->lock);
                 goto fail;
             }
         }
 
+        qemu_co_mutex_lock(&s->lock);
+
         while (l2meta != NULL) {
             QCowL2Meta *next;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 2cc70b9..98e5136 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -758,6 +758,10 @@ occasions where a cluster gets freed (on/off; default: off)
 Which overlap checks to perform for writes to the image
 (none/constant/cached/all; default: cached). For details or finer
 granularity control refer to the QAPI documentation of @code{blockdev-add}.
+
+@item prealloc-size
+The number of bytes that will be preallocated ahead at qcow2 file expansion
+(allocating a new cluster beyond the end of file).
 @end table
 
 Example 1:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 08/15] qcow2: set inactive flag
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (6 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 07/15] qcow2: preallocation at image expand Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-04 20:00   ` Eric Blake
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 09/15] qcow2: truncate preallocated space Anton Nefedov
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

Qcow2State and BlockDriverState flags have to be in sync

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 66aa8c2..2a1d2f2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2138,6 +2138,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
 
     if (result == 0) {
         qcow2_mark_clean(bs);
+        s->flags |= BDRV_O_INACTIVE;
     }
 
     return result;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 09/15] qcow2: truncate preallocated space
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (7 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 08/15] qcow2: set inactive flag Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 10/15] qcow2: check space leak at the end of the image Anton Nefedov
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, den, kwolf, mreitz, eblake, Denis V. Lunev, Anton Nefedov

From: "Denis V. Lunev" <den@openvz.org>

This could be done after calculation of the end of data and metadata in
the qcow2 image.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.h          | 3 +++
 block/qcow2-cluster.c  | 9 +++++++++
 block/qcow2-refcount.c | 7 +++++++
 block/qcow2.c          | 7 +++++++
 4 files changed, 26 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index ebbb9cf..595ed9c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -330,6 +330,7 @@ typedef struct BDRVQcow2State {
     char *image_backing_format;
 
     uint64_t prealloc_size;
+    uint64_t data_end;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
@@ -669,4 +670,6 @@ void qcow2_remove_persistent_dirty_bitmap(BlockDriverState *bs,
                                           const char *name,
                                           Error **errp);
 
+void qcow2_update_data_end(BlockDriverState *bs, uint64_t off);
+
 #endif
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 0185986..75baaf4 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2014,3 +2014,12 @@ fail:
     g_free(l1_table);
     return ret;
 }
+
+void qcow2_update_data_end(BlockDriverState *bs, uint64_t off)
+{
+    BDRVQcow2State *s = bs->opaque;
+
+    if (s->data_end < off) {
+        s->data_end = off;
+    }
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index c9b0dcb..d741a92 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -833,6 +833,9 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
             ret = alloc_refcount_block(bs, cluster_index, &refcount_block);
             if (ret < 0) {
                 goto fail;
+            } else {
+                qcow2_update_data_end(bs, s->refcount_table_offset +
+                        s->refcount_table_size * sizeof(uint64_t));
             }
         }
         old_table_index = table_index;
@@ -954,6 +957,8 @@ retry:
         s->free_cluster_index - 1 > (INT64_MAX >> s->cluster_bits))
     {
         return -EFBIG;
+    } else {
+        qcow2_update_data_end(bs, s->free_cluster_index << s->cluster_bits);
     }
 
 #ifdef DEBUG_ALLOC2
@@ -1018,6 +1023,8 @@ int64_t qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
 
     if (ret < 0) {
         return ret;
+    } else {
+        qcow2_update_data_end(bs, offset + (nb_clusters << s->cluster_bits));
     }
 
     return i;
diff --git a/block/qcow2.c b/block/qcow2.c
index 2a1d2f2..4696106 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1498,6 +1498,8 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
         }
     }
 
+    s->data_end = bdrv_getlength(bs->file->bs);
+
 #ifdef DEBUG_ALLOC
     {
         BdrvCheckResult result = {0};
@@ -2138,6 +2140,11 @@ static int qcow2_inactivate(BlockDriverState *bs)
 
     if (result == 0) {
         qcow2_mark_clean(bs);
+
+        /* truncate preallocated space */
+        if (!bs->read_only && s->data_end < bdrv_getlength(bs->file->bs)) {
+            bdrv_truncate(bs->file, s->data_end, PREALLOC_MODE_OFF, NULL);
+        }
         s->flags |= BDRV_O_INACTIVE;
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 10/15] qcow2: check space leak at the end of the image
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (8 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 09/15] qcow2: truncate preallocated space Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 11/15] qcow2: move is_zero() up Anton Nefedov
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, den, kwolf, mreitz, eblake, Pavel Butsykin,
	Denis V . Lunev, Anton Nefedov

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

Preallocated space in the image may remain unused; the patch adds
the functionality to identify and fix it in the qcow2_check
to avoid wasting storage space on the host.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.c                      |  32 ++++++++++++
 tests/qemu-iotests/026.out         | 104 ++++++++++++++++++++++++++++---------
 tests/qemu-iotests/026.out.nocache | 104 ++++++++++++++++++++++++++++---------
 tests/qemu-iotests/029.out         |   5 +-
 tests/qemu-iotests/060.out         |  10 +++-
 tests/qemu-iotests/061.out         |   5 +-
 tests/qemu-iotests/066.out         |   5 +-
 tests/qemu-iotests/098.out         |   7 ++-
 tests/qemu-iotests/108.out         |   5 +-
 tests/qemu-iotests/112.out         |   5 +-
 10 files changed, 226 insertions(+), 56 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4696106..f522ba9 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -530,6 +530,33 @@ int qcow2_mark_consistent(BlockDriverState *bs)
     return 0;
 }
 
+static int qcow2_check_extra_preallocation(BlockDriverState *bs,
+    BdrvCheckResult *res, BdrvCheckMode fix)
+{
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t img_size = bdrv_getlength(bs->file->bs);
+
+    if (res->image_end_offset < img_size) {
+        uint64_t count =
+            DIV_ROUND_UP(img_size - res->image_end_offset, s->cluster_size);
+        fprintf(stderr, "%s space leaked at the end of the image %jd\n",
+                fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
+                img_size - res->image_end_offset);
+        res->leaks += count;
+        if (fix & BDRV_FIX_LEAKS) {
+            int ret = bdrv_truncate(bs->file, res->image_end_offset,
+                                    PREALLOC_MODE_OFF, NULL);
+            if (ret < 0) {
+                res->check_errors++;
+                return ret;
+            }
+            res->leaks_fixed += count;
+        }
+    }
+
+    return 0;
+}
+
 static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
                        BdrvCheckMode fix)
 {
@@ -538,6 +565,11 @@ static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
         return ret;
     }
 
+    ret = qcow2_check_extra_preallocation(bs, result, fix);
+    if (ret < 0) {
+        return ret;
+    }
+
     if (fix && result->check_errors == 0 && result->corruptions == 0) {
         ret = qcow2_mark_clean(bs);
         if (ret < 0) {
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index 86a50a2..bc572dc 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -5,7 +5,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write -b
@@ -33,7 +36,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write -b
@@ -181,7 +187,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 5; imm: off; once: on; write
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 5; imm: off; once: on; write -b
@@ -207,7 +216,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 28; imm: off; once: on; write -b
@@ -468,20 +480,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 5120
 
-55 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -489,19 +508,26 @@ Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 265728
 
-251 leaked clusters were found on the image.
+770 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write
@@ -520,20 +546,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 27648
 
-11 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -541,27 +574,35 @@ Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 21504
 
-23 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-11 leaked clusters were found on the image.
+268 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -569,27 +610,35 @@ Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-23 leaked clusters were found on the image.
+280 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-11 leaked clusters were found on the image.
+270 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -597,8 +646,9 @@ Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-23 leaked clusters were found on the image.
+282 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 
 === L1 growth tests ===
@@ -651,7 +701,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: on
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: off
@@ -665,7 +718,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: on
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: off
diff --git a/tests/qemu-iotests/026.out.nocache b/tests/qemu-iotests/026.out.nocache
index ea2e166..167fb78 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -5,7 +5,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write 
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 5; imm: off; once: on; write -b
@@ -33,7 +36,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 28; imm: off; once: on; write -b
@@ -189,7 +195,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 5; imm: off; once: on; write 
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_alloc_write; errno: 5; imm: off; once: on; write -b
@@ -215,7 +224,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_alloc_write; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_alloc_write; errno: 28; imm: off; once: on; write -b
@@ -476,20 +488,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 5120
 
-55 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -497,19 +516,26 @@ Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 265728
 
-251 leaked clusters were found on the image.
+770 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write 
@@ -528,20 +554,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 27648
 
-11 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -549,27 +582,35 @@ Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 21504
 
-23 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-11 leaked clusters were found on the image.
+268 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -577,27 +618,35 @@ Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-23 leaked clusters were found on the image.
+280 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-11 leaked clusters were found on the image.
+270 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -605,8 +654,9 @@ Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-23 leaked clusters were found on the image.
+282 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 
 === L1 growth tests ===
@@ -659,7 +709,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: on
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: off
@@ -673,7 +726,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: on
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: off
diff --git a/tests/qemu-iotests/029.out b/tests/qemu-iotests/029.out
index 5bc93e0..7176e38 100644
--- a/tests/qemu-iotests/029.out
+++ b/tests/qemu-iotests/029.out
@@ -6,7 +6,10 @@ is smaller than the current L1 table.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 wrote 4096/4096 bytes at offset 0
 4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 65536
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=16777216
 wrote 4194304/4194304 bytes at offset 0
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 5ca3af4..a20e267 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -78,20 +78,26 @@ Leaked cluster 9 refcount=1 reference=0
 Repairing cluster 4 refcount=1 reference=2
 Repairing cluster 9 refcount=1 reference=0
 Repairing OFLAG_COPIED data cluster: l2_entry=8000000000040000 refcount=2
+ERROR space leaked at the end of the image 65536
 The following inconsistencies were found and repaired:
 
     1 leaked clusters
     2 corruptions
 
 Double checking the fixed image now...
-No errors were found on the image.
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 incompatible_features     0x0
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 incompatible_features     0x0
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 131072
+
+2 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index a431b7f..69fb929 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -328,7 +328,10 @@ wrote 131072/131072 bytes at offset 0
 No errors were found on the image.
 read 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 196608
+
+3 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 read 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index 3d9da9b..f94aa5c 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -13,7 +13,10 @@ discard 67109376/67109376 bytes at offset 0
 64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 67109376/67109376 bytes at offset 0
 64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 327680
+
+5 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Offset          Length          Mapped to       File
 
 === Writing to preallocated zero clusters ===
diff --git a/tests/qemu-iotests/098.out b/tests/qemu-iotests/098.out
index 7634d0e..733249f 100644
--- a/tests/qemu-iotests/098.out
+++ b/tests/qemu-iotests/098.out
@@ -20,7 +20,10 @@ Leaked cluster 4 refcount=1 reference=0
 Leaked cluster 5 refcount=1 reference=0
 Repairing cluster 4 refcount=1 reference=0
 Repairing cluster 5 refcount=1 reference=0
-No errors were found on the image.
+ERROR space leaked at the end of the image 131072
+
+2 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 
 === reftable_update ===
 
@@ -34,6 +37,7 @@ ERROR cluster 1 refcount=0 reference=1
 ERROR cluster 3 refcount=0 reference=1
 Rebuilding refcount structure
 Repairing cluster 1 refcount=1 reference=0
+Repairing space leaked at the end of the image 65536
 No errors were found on the image.
 
 === refblock_alloc ===
@@ -48,5 +52,6 @@ ERROR cluster 1 refcount=0 reference=1
 ERROR cluster 3 refcount=0 reference=1
 Rebuilding refcount structure
 Repairing cluster 1 refcount=1 reference=0
+Repairing space leaked at the end of the image 65536
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/108.out b/tests/qemu-iotests/108.out
index 75bab8d..d178b1e 100644
--- a/tests/qemu-iotests/108.out
+++ b/tests/qemu-iotests/108.out
@@ -68,13 +68,16 @@ Rebuilding refcount structure
 Repairing cluster 1 refcount=1 reference=0
 Repairing cluster 2 refcount=1 reference=0
 Repairing cluster 16 refcount=1 reference=0
+ERROR space leaked at the end of the image 720896
 The following inconsistencies were found and repaired:
 
     0 leaked clusters
     2 corruptions
 
 Double checking the fixed image now...
-No errors were found on the image.
+
+11 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 
 --- Signed overflow after the refblock ---
 
diff --git a/tests/qemu-iotests/112.out b/tests/qemu-iotests/112.out
index 81b04d1..7386be8 100644
--- a/tests/qemu-iotests/112.out
+++ b/tests/qemu-iotests/112.out
@@ -93,7 +93,10 @@ refcount bits: 1
 
 === Amend from refcount_bits=1 to refcount_bits=64 ===
 
-No errors were found on the image.
+ERROR space leaked at the end of the image 131072
+
+2 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 refcount bits: 64
 
 === Amend to compat=0.10 ===
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 11/15] qcow2: move is_zero() up
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (9 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 10/15] qcow2: check space leak at the end of the image Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 12/15] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

To be used in the following commit without a forward declaration.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.c | 45 ++++++++++++++++++++++-----------------------
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index f522ba9..2ec8b03 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1946,6 +1946,28 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
     return false;
 }
 
+static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+    int64_t nr;
+    int64_t res;
+    int64_t start;
+
+    /* Widen to sector boundaries, then clamp to image length, before
+     * checking status of underlying sectors */
+    start = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+    bytes = QEMU_ALIGN_UP(offset + bytes, BDRV_SECTOR_SIZE) - start;
+
+    if (start + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
+        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - start;
+    }
+
+    if (!bytes) {
+        return true;
+    }
+    res = bdrv_block_status_above(bs, NULL, start, bytes, &nr, NULL);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
+}
+
 /*
  * If the specified area is beyond EOF, allocates it + prealloc_size
  * bytes ahead.
@@ -3088,29 +3110,6 @@ finish:
     return ret;
 }
 
-
-static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
-{
-    int64_t nr;
-    int64_t res;
-    int64_t start;
-
-    /* Widen to sector boundaries, then clamp to image length, before
-     * checking status of underlying sectors */
-    start = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
-    bytes = QEMU_ALIGN_UP(offset + bytes, BDRV_SECTOR_SIZE) - start;
-
-    if (start + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
-        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - start;
-    }
-
-    if (!bytes) {
-        return true;
-    }
-    res = bdrv_block_status_above(bs, NULL, start, bytes, &nr, NULL);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
-}
-
 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 12/15] qcow2: skip writing zero buffers to empty COW areas
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (10 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 11/15] qcow2: move is_zero() up Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 13/15] qcow2: allocate image space by-cluster Anton Nefedov
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

It can be detected that
  1. COW alignment of a write request is zeroes
  2. Respective areas on the underlying BDS already read as zeroes
     after being preallocated previously

If both of these true, COW may be skipped

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.h         | 12 +++++++++++
 block/qcow2-cluster.c |  5 ++++-
 block/qcow2.c         | 60 ++++++++++++++++++++++++++++++++++++++++++++-------
 block/trace-events    |  1 +
 4 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 595ed9c..db1c6f5 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -363,6 +363,12 @@ typedef struct QCowL2Meta
     bool keep_old_clusters;
 
     /**
+     * True if the area is allocated at the end of data area
+     * (i.e. >= BDRVQcow2State::data_end)
+     */
+    bool clusters_are_trailing;
+
+    /**
      * Requests that overlap with this allocation and wait to be restarted
      * when the allocating request has completed.
      */
@@ -381,6 +387,12 @@ typedef struct QCowL2Meta
     Qcow2COWRegion cow_end;
 
     /**
+     * Indicates that both COW areas are empty (nb_bytes == 0)
+     * or filled with zeroes and do not require any more copying
+     */
+    bool zero_cow;
+
+    /**
      * The I/O vector with the data from the actual guest write request.
      * If non-NULL, this is meant to be merged together with the data
      * from @cow_start and @cow_end into one single write operation.
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 75baaf4..d54b96a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -735,7 +735,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->offset + start->nb_bytes <= end->offset);
     assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
-    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
+    if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->zero_cow) {
         return 0;
     }
 
@@ -1203,6 +1203,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t *host_offset, uint64_t *bytes, QCowL2Meta **m)
 {
     BDRVQcow2State *s = bs->opaque;
+    const uint64_t old_data_end = s->data_end;
     int l2_index;
     uint64_t *l2_table;
     uint64_t entry;
@@ -1324,6 +1325,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
         .alloc_offset   = alloc_cluster_offset,
         .offset         = start_of_cluster(s, guest_offset),
         .nb_clusters    = nb_clusters,
+        .clusters_are_trailing = alloc_cluster_offset >= old_data_end,
 
         .keep_old_clusters  = keep_old_clusters,
 
@@ -1335,6 +1337,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
             .offset     = nb_bytes,
             .nb_bytes   = avail_bytes - nb_bytes,
         },
+        .zero_cow = false,
     };
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2.c b/block/qcow2.c
index 2ec8b03..e49ad50 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1921,6 +1921,11 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
             continue;
         }
 
+        /* If both COW regions are zeroes already, skip this too */
+        if (m->zero_cow) {
+            continue;
+        }
+
         /* The data (middle) region must be immediately after the
          * start region */
         if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
@@ -1971,26 +1976,61 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
 /*
  * If the specified area is beyond EOF, allocates it + prealloc_size
  * bytes ahead.
+ *
+ * Returns
+ *   true if the space is allocated and contains zeroes
  */
-static void coroutine_fn handle_prealloc(BlockDriverState *bs,
+static bool coroutine_fn handle_prealloc(BlockDriverState *bs,
                                          const QCowL2Meta *m)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t start = m->alloc_offset;
     uint64_t end = start + m->nb_clusters * s->cluster_size;
+    int ret;
     int64_t flen = bdrv_getlength(bs->file->bs);
 
     if (flen < 0) {
-        return;
+        return false;
     }
 
     if (end > flen) {
         /* try to alloc host space in one chunk for better locality */
-        bdrv_co_pwrite_zeroes(bs->file, flen,
-                              QEMU_ALIGN_UP(end + s->prealloc_size - flen,
-                                            s->cluster_size),
-                              BDRV_REQ_ALLOCATE);
+        ret = bdrv_co_pwrite_zeroes(bs->file, flen,
+                                    QEMU_ALIGN_UP(end + s->prealloc_size - flen,
+                                                  s->cluster_size),
+                                    BDRV_REQ_ALLOCATE);
+        if (ret < 0) {
+            return false;
+        }
     }
+
+    /* We're safe to assume that the area is zeroes if the area
+     * was allocated at the end of data (s->data_end).
+     * In this case, the only way for file length to be bigger is that
+     * the area was preallocated by this or another request.
+     */
+    return m->clusters_are_trailing;
+}
+
+static bool check_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
+{
+    if (bs->encrypted) {
+        return false;
+    }
+
+    if (m->cow_start.nb_bytes != 0 &&
+        !is_zero(bs, m->offset + m->cow_start.offset, m->cow_start.nb_bytes))
+    {
+        return false;
+    }
+
+    if (m->cow_end.nb_bytes != 0 &&
+        !is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes))
+    {
+        return false;
+    }
+
+    return true;
 }
 
 static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
@@ -1999,8 +2039,12 @@ static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
     QCowL2Meta *m;
 
     for (m = l2meta; m != NULL; m = m->next) {
-        if (s->prealloc_size) {
-            handle_prealloc(bs, m);
+        if (s->prealloc_size && handle_prealloc(bs, m)) {
+            if (check_zero_cow(bs, m)) {
+                trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset,
+                                     m->nb_clusters);
+                m->zero_cow = true;
+            }
         }
     }
 }
diff --git a/block/trace-events b/block/trace-events
index 13a5a87..faf1811 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -61,6 +61,7 @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
 qcow2_writev_data(void *co, uint64_t offset) "co %p offset %" PRIx64
 qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset %" PRIx64 " count %d"
 qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset %" PRIx64 " count %d"
+qcow2_skip_cow(void* co, uint64_t offset, int nb_clusters) "co %p offset %" PRIx64 " nb_clusters %d"
 
 # block/qcow2-cluster.c
 qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset %" PRIx64 " bytes %d"
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 13/15] qcow2: allocate image space by-cluster
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (11 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 12/15] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 14/15] iotest 190: test BDRV_REQ_ALLOCATE Anton Nefedov
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

If COW areas of the newly allocated clusters are zeroes on the backing image:
(even if preallocation feature is not used or it cannot detect if the image
already reads as zeroes, e.g. writing to a hole / preallocated zero cluster)
efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
cluster instead of writing explicit zero buffers later in perform_cow().

iotest 060:
write to the discarded cluster does not trigger COW anymore.
so, break on write_aio event instead, will work for the test
(but write won't fail anymore, so update reference output)

iotest 066:
cluster-alignment areas that were not really COWed are now detected
as zeroes, hence the initial write has to be exactly the same size for
the maps to match

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.c              | 22 +++++++++++++++++-----
 tests/qemu-iotests/060     |  2 +-
 tests/qemu-iotests/060.out |  3 ++-
 tests/qemu-iotests/066     |  2 +-
 tests/qemu-iotests/066.out |  4 ++--
 5 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index e49ad50..9c49d40 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2039,13 +2039,25 @@ static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
     QCowL2Meta *m;
 
     for (m = l2meta; m != NULL; m = m->next) {
-        if (s->prealloc_size && handle_prealloc(bs, m)) {
-            if (check_zero_cow(bs, m)) {
-                trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset,
-                                     m->nb_clusters);
-                m->zero_cow = true;
+        bool preallocated_zeroes = s->prealloc_size && handle_prealloc(bs, m);
+
+        if (!check_zero_cow(bs, m)) {
+            continue;
+        }
+
+        if (!preallocated_zeroes &&
+            (m->cow_start.nb_bytes != 0 || m->cow_end.nb_bytes != 0))
+        {
+            if (bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
+                                      m->nb_clusters * s->cluster_size,
+                                      BDRV_REQ_ALLOCATE) != 0)
+            {
+                continue;
             }
         }
+
+        trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
+        m->zero_cow = true;
     }
 }
 
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 8e95c45..3a0f096 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -160,7 +160,7 @@ poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
 # any unallocated cluster, leading to an attempt to overwrite the second L2
 # table. Finally, resume the COW write and see it fail (but not crash).
 echo "open -o file.driver=blkdebug $TEST_IMG
-break cow_read 0
+break write_aio 0
 aio_write 0k 1k
 wait_break 0
 write 64k 64k
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index a20e267..290ccec 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -114,7 +114,8 @@ qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps
 blkdebug: Suspended request '0'
 write failed: Input/output error
 blkdebug: Resuming request '0'
-aio_write failed: No medium found
+wrote 1024/1024 bytes at offset 0
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 === Testing unallocated image header ===
 
diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
index 8638217..3c216a1 100755
--- a/tests/qemu-iotests/066
+++ b/tests/qemu-iotests/066
@@ -71,7 +71,7 @@ echo
 _make_test_img $IMG_SIZE
 
 # Create data clusters (not aligned to an L2 table)
-$QEMU_IO -c 'write -P 42 1M 256k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -P 42 $(((1024 + 32) * 1024)) 192k" "$TEST_IMG" | _filter_qemu_io
 orig_map=$($QEMU_IMG map --output=json "$TEST_IMG")
 
 # Convert the data clusters to preallocated zero clusters
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index f94aa5c..81ef795 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -22,8 +22,8 @@ Offset          Length          Mapped to       File
 === Writing to preallocated zero clusters ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67109376
-wrote 262144/262144 bytes at offset 1048576
-256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 1081344
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 262144/262144 bytes at offset 1048576
 256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 196608/196608 bytes at offset 1081344
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 14/15] iotest 190: test BDRV_REQ_ALLOCATE
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (12 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 13/15] qcow2: allocate image space by-cluster Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-08-04 15:27   ` Eric Blake
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 15/15] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
  2017-09-21 17:16 ` [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
  15 siblings, 1 reply; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 tests/qemu-iotests/190     | 146 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/190.out |  50 ++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 197 insertions(+)
 create mode 100755 tests/qemu-iotests/190
 create mode 100644 tests/qemu-iotests/190.out

diff --git a/tests/qemu-iotests/190 b/tests/qemu-iotests/190
new file mode 100755
index 0000000..ad7162a
--- /dev/null
+++ b/tests/qemu-iotests/190
@@ -0,0 +1,146 @@
+#!/bin/env bash
+#
+# Test qcow2 BDRV_REQ_ALLOCATE requests
+#
+# Copyright (c) 2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+##
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+function do_io()
+{
+    $QEMU_IO "$@" | _filter_qemu_io |\
+        sed -e 's/bytes at offset [0-9]*/bytes at offset XXX/g'
+}
+
+CLUSTER_SIZE=64k
+size=128M
+
+_make_test_img 1G
+
+echo
+echo "== Test discarded cluster reuse =="
+
+# allocate first two clusters
+do_io -c "writev -P 1 0x8000 0x10000" "$TEST_IMG"
+len=$(stat -c "%s" $TEST_IMG)
+
+# discard the 1st cluster on qcow2 level only
+do_io -c "open -o pass-discard-request=off $TEST_IMG" -c "discard 0 0x10000"
+
+# new write will reuse the dirty host cluster and has to overwrite that
+#  with zeroes
+do_io -c "writev -P 2 0x24000 0x8000" "$TEST_IMG"
+if [ $len -ne $(stat -c "%s" $TEST_IMG) ] ; then
+    >&2 echo "Failed to reuse cluster"
+    exit 1
+fi
+
+echo
+echo "== Test preallocation =="
+
+function io_commands()
+{
+    echo "open -o prealloc-size=$((1024*1024)) blkdebug::$TEST_IMG"
+
+    # Verify that intersections of a running preallocation and new requests
+    #  is handled properly.
+    #
+    # 1. send a write #1 which triggers preallocation, suspend it in action
+    # 2. send a write #2 which intersects with the area being preallocated
+    # 3. using break/wait_break/resume, wait until write #2 is at least
+    #    at WRITE_AIO tracepoint.
+    #    Then it is supposed to enter pwrite(bs->child) and start waiting
+    #    for #1 to finish
+    # 4. resume #1
+
+cat <<EOF
+break pwritev_zero A
+aio_write -P 3 0x30000 0x1000
+wait_break A
+
+break write_aio B
+aio_write -P 4 0x40000 0x1000
+wait_break B
+resume B
+
+resume A
+aio_flush
+EOF
+
+    # Verify that new cluster in the preallocated area triggers
+    #  neither new preallocation nor COW read
+    #
+    # TODO: this test will not fail but hang. Better ideas?
+    #       wait and kill by timeout?
+
+cat <<EOF
+break pwritev_zero A
+break cow_read B
+writev -P 5 0x51000 0x1000
+EOF
+}
+
+io_commands | do_io
+
+echo
+echo "== Verify image content =="
+
+function verify_io()
+{
+cat <<EOF
+read -P 0 0 0x10000
+read -P 1 0x10000 0x8000
+read -P 0 0x18000 0xc000
+read -P 2 0x24000 0x8000
+read -P 0 0x2c000 0x4000
+
+read -P 3 0x30000 0x1000
+read -P 0 0x31000 0xf000
+read -P 4 0x40000 0x1000
+read -P 0 0x41000 0xf000
+
+read -P 0 0x50000 0x1000
+read -P 5 0x51000 0x1000
+read -P 0 0x52000 0xe000
+EOF
+}
+
+verify_io | do_io $TEST_IMG
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/190.out b/tests/qemu-iotests/190.out
new file mode 100644
index 0000000..46597ab
--- /dev/null
+++ b/tests/qemu-iotests/190.out
@@ -0,0 +1,50 @@
+QA output created by 190
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
+
+== Test discarded cluster reuse ==
+wrote 65536/65536 bytes at offset XXX
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 65536/65536 bytes at offset XXX
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset XXX
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Test preallocation ==
+blkdebug: Suspended request 'A'
+blkdebug: Suspended request 'B'
+blkdebug: Resuming request 'B'
+blkdebug: Resuming request 'A'
+wrote 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Verify image content ==
+read 65536/65536 bytes at offset XXX
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset XXX
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 49152/49152 bytes at offset XXX
+48 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset XXX
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16384/16384 bytes at offset XXX
+16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 61440/61440 bytes at offset XXX
+60 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 61440/61440 bytes at offset XXX
+60 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 57344/57344 bytes at offset XXX
+56 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 2aba585..876b23f 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -183,3 +183,4 @@
 185 rw auto
 188 rw auto quick
 189 rw auto quick
+190 rw auto
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v4 15/15] iotest 134: test cluster-misaligned encrypted write
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (13 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 14/15] iotest 190: test BDRV_REQ_ALLOCATE Anton Nefedov
@ 2017-08-01 14:19 ` Anton Nefedov
  2017-09-21 17:16 ` [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-01 14:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake, Anton Nefedov

COW (even empty/zero) areas require encryption too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 tests/qemu-iotests/134     |  9 +++++++++
 tests/qemu-iotests/134.out | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 9914415..6083ae4 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -59,6 +59,15 @@ echo "== reading whole image =="
 $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
 echo
+echo "== rewriting cluster part =="
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== verify pattern =="
+$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
 echo "== rewriting whole image =="
 $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index 972be49..09d46f6 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -5,6 +5,16 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
 read 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+== rewriting cluster part ==
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 == rewriting whole image ==
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 for-2.10 01/15] mirror: inherit supported write/zero flags
  2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 01/15] mirror: inherit supported write/zero flags Anton Nefedov
@ 2017-08-04 15:13   ` Eric Blake
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2017-08-04 15:13 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, den, kwolf, mreitz

[-- Attachment #1: Type: text/plain, Size: 1139 bytes --]

On 08/01/2017 09:18 AM, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  block/mirror.c | 5 +++++
>  1 file changed, 5 insertions(+)

I think this one counts as a bug fix worthy of inclusion in 2.10.

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/mirror.c b/block/mirror.c
> index d46dace..7e539f1 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1056,6 +1056,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
>      bdrv_refresh_filename(bs->backing->bs);
>      pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>              bs->backing->bs->filename);
> +    bs->supported_write_flags = BDRV_REQ_FUA &
> +        bs->backing->bs->supported_write_flags;
> +    bs->supported_zero_flags =
> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +        bs->backing->bs->supported_zero_flags;
>  }
>  
>  static void bdrv_mirror_top_close(BlockDriverState *bs)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/15] blkverify: set supported write/zero flags
  2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 02/15] blkverify: set " Anton Nefedov
@ 2017-08-04 15:23   ` Eric Blake
  2017-08-07 12:19     ` Anton Nefedov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2017-08-04 15:23 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, den, kwolf, mreitz

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

On 08/01/2017 09:18 AM, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  block/blkverify.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

Basically, blkverify supports a flag if BOTH of its underlying files
also support the flag; if either side can't handle the flag, then we
fall back to emulation for both sides.

With more overhead, we COULD state that we support both bits if at least
one of the two underlying BDS supports the bit, and then emulate support
for the bit on the second BDS where it was lacking, so that at least the
first BDS doesn't suffer from the penalties of the fallbacks.  But that
means duplicating the block layer fallback code in blkverify, which is
already something that we don't necessarily expect high performance from.

For FUA, failure to implement the bit merely means that we have more
device-wide flush calls (instead of per-transaction mini-flushes), but
the end data should be the same.  But for MAY_UNMAP, I'm worried that we
may have situations where a plain BDS will create holes, while running
the same device paired through blkverify will fall back to slower
explicit zeroes.  I'm wondering whether this will bite us, if we have
scenarios where the mere fact of trying to verify block device behavior
changes what behavior we are even verifying.

Thus, while I think the code change _looks_ okay, I'm not sure if it is
correct design-wise, nor whether it is 2.10 material.

> +    bs->supported_write_flags = BDRV_REQ_FUA &
> +        bs->file->bs->supported_write_flags &
> +        s->test_file->bs->supported_write_flags;
> +
> +    bs->supported_zero_flags =
> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +        bs->file->bs->supported_zero_flags &
> +        s->test_file->bs->supported_zero_flags;
> +
>      ret = 0;
>  fail:
>      qemu_opts_del(opts);
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 14/15] iotest 190: test BDRV_REQ_ALLOCATE
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 14/15] iotest 190: test BDRV_REQ_ALLOCATE Anton Nefedov
@ 2017-08-04 15:27   ` Eric Blake
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Blake @ 2017-08-04 15:27 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, den, kwolf, mreitz

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]

On 08/01/2017 09:19 AM, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  tests/qemu-iotests/190     | 146 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/190.out |  50 ++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 197 insertions(+)
>  create mode 100755 tests/qemu-iotests/190
>  create mode 100644 tests/qemu-iotests/190.out

190 already exists on mainline, you'll have to choose an available test
id that hasn't been mentioned on-list yet (yeah, our process for
reserving test ids is lousy. Max has already said he doesn't mind fixing
conflicts if it is just a duplicated test number, for qemu-iotests that
he manages)


> +_cleanup()
> +{
> +	_cleanup_test_img
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15

This part won't be needed, if you land after Stefan's patch that adds
per-test directories with the ability to preserve temporary files

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2017-08-04 19:51   ` Eric Blake
  2017-08-29 12:47   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Blake @ 2017-08-04 19:51 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, den, kwolf, mreitz

[-- Attachment #1: Type: text/plain, Size: 3466 bytes --]

On 08/01/2017 09:19 AM, Anton Nefedov wrote:
> The flag is supposed to indicate that the region of the disk image has
> to be sufficiently allocated so it reads as zeroes. The call with the flag
> set has to return -ENOTSUP if allocation cannot be done efficiently
> (i.e. without falling back to writing actual buffers)
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  include/block/block.h     |  6 +++++-
>  include/block/block_int.h |  2 +-
>  block/io.c                | 20 +++++++++++++++++---
>  3 files changed, 23 insertions(+), 5 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

You might want the commit message to be a bit more verbose...

> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 7fe0125..828da67 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -65,9 +65,13 @@ typedef enum {
>      BDRV_REQ_NO_SERIALISING     = 0x8,
>      BDRV_REQ_FUA                = 0x10,
>      BDRV_REQ_WRITE_COMPRESSED   = 0x20,
> +    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
> +     * efficiently allocate the space so it reads as zeroes, or return an error.
> +     */
> +    BDRV_REQ_ALLOCATE           = 0x40,
>  
>      /* Mask of valid flags */
> -    BDRV_REQ_MASK               = 0x3f,
> +    BDRV_REQ_MASK               = 0x7f,
>  } BdrvRequestFlags;
>  
>  typedef struct BlockSizes {
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 9b94b32..9b64411 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -585,7 +585,7 @@ struct BlockDriverState {
>      /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
>      unsigned int supported_write_flags;
>      /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
> -     * BDRV_REQ_MAY_UNMAP) */
> +     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
>      unsigned int supported_zero_flags;

...in addition to adding the new flag here and documenting its semantics
for drivers...

>  
>      /* the following member gives a name to every node on the bs graph. */
> diff --git a/block/io.c b/block/io.c
> index 375fc66..04d495e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1245,7 +1245,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>              assert(!bs->supported_zero_flags);
>          }
>  
> -        if (ret == -ENOTSUP) {
> +        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
>              /* Fall back to bounce buffer if write zeroes is unsupported */
>              BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;

...you also made sure that anywhere the flag is in use you avoid a slow
fallback...

> @@ -1639,6 +1645,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
>  {
>      trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
>  
> +    assert(!(flags & BDRV_REQ_MAY_UNMAP && flags & BDRV_REQ_ALLOCATE));
> +
> +    if (flags & BDRV_REQ_ALLOCATE &&
> +        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
> +    {
> +        return -ENOTSUP;

...as well as providing a sane default to make the flag always trigger
-ENOTSUP until individual drivers implement something in later patches.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/15] qcow2: set inactive flag
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 08/15] qcow2: set inactive flag Anton Nefedov
@ 2017-08-04 20:00   ` Eric Blake
  2017-08-07 12:06     ` Anton Nefedov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Blake @ 2017-08-04 20:00 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, den, kwolf, mreitz

[-- Attachment #1: Type: text/plain, Size: 763 bytes --]

On 08/01/2017 09:19 AM, Anton Nefedov wrote:
> Qcow2State and BlockDriverState flags have to be in sync
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  block/qcow2.c | 1 +
>  1 file changed, 1 insertion(+)

Is this a bug fix needed for 2.10?

> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 66aa8c2..2a1d2f2 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2138,6 +2138,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
>  
>      if (result == 0) {
>          qcow2_mark_clean(bs);
> +        s->flags |= BDRV_O_INACTIVE;
>      }
>  
>      return result;
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 08/15] qcow2: set inactive flag
  2017-08-04 20:00   ` Eric Blake
@ 2017-08-07 12:06     ` Anton Nefedov
  0 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-07 12:06 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, den, kwolf, mreitz


On 08/04/2017 11:00 PM, Eric Blake wrote:
> On 08/01/2017 09:19 AM, Anton Nefedov wrote:
>> Qcow2State and BlockDriverState flags have to be in sync
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> ---
>>   block/qcow2.c | 1 +
>>   1 file changed, 1 insertion(+)
> 
> Is this a bug fix needed for 2.10?
> 

I don't think this bites us now, but yes looks like a bug

>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 66aa8c2..2a1d2f2 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -2138,6 +2138,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
>>   
>>       if (result == 0) {
>>           qcow2_mark_clean(bs);
>> +        s->flags |= BDRV_O_INACTIVE;
>>       }
>>   
>>       return result;
>>
> 
/Anton

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 02/15] blkverify: set supported write/zero flags
  2017-08-04 15:23   ` Eric Blake
@ 2017-08-07 12:19     ` Anton Nefedov
  0 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-08-07 12:19 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: qemu-block, den, kwolf, mreitz

On 08/04/2017 06:23 PM, Eric Blake wrote:
> On 08/01/2017 09:18 AM, Anton Nefedov wrote:
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> ---
>>   block/blkverify.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
> 
> Basically, blkverify supports a flag if BOTH of its underlying files
> also support the flag; if either side can't handle the flag, then we
> fall back to emulation for both sides.
> 
> With more overhead, we COULD state that we support both bits if at least
> one of the two underlying BDS supports the bit, and then emulate support
> for the bit on the second BDS where it was lacking, so that at least the
> first BDS doesn't suffer from the penalties of the fallbacks.  But that
> means duplicating the block layer fallback code in blkverify, which is
> already something that we don't necessarily expect high performance from.
> 
> For FUA, failure to implement the bit merely means that we have more
> device-wide flush calls (instead of per-transaction mini-flushes), but
> the end data should be the same.  But for MAY_UNMAP, I'm worried that we
> may have situations where a plain BDS will create holes, while running
> the same device paired through blkverify will fall back to slower
> explicit zeroes.  I'm wondering whether this will bite us, if we have
> scenarios where the mere fact of trying to verify block device behavior
> changes what behavior we are even verifying.
> 
> Thus, while I think the code change _looks_ okay, I'm not sure if it is
> correct design-wise, nor whether it is 2.10 material.
> 

But this change doesn't make things worse, does it?
Blkverify will support at least the matching bytes, while now it drops
them all.

Will the 'reference' driver (I guess it's raw/file in most of the cases)
usually support a superset of 'tested' driver supported bits?
Maybe we should report an error otherwise?


>> +    bs->supported_write_flags = BDRV_REQ_FUA &
>> +        bs->file->bs->supported_write_flags &
>> +        s->test_file->bs->supported_write_flags;
>> +
>> +    bs->supported_zero_flags =
>> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
>> +        bs->file->bs->supported_zero_flags &
>> +        s->test_file->bs->supported_zero_flags;
>> +
>>       ret = 0;
>>   fail:
>>       qemu_opts_del(opts);
>>
> 
/Anton

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
  2017-08-04 19:51   ` Eric Blake
@ 2017-08-29 12:47   ` Alberto Garcia
  1 sibling, 0 replies; 27+ messages in thread
From: Alberto Garcia @ 2017-08-29 12:47 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: kwolf, den, qemu-block, mreitz

On Tue 01 Aug 2017 04:19:00 PM CEST, Anton Nefedov wrote:
> The flag is supposed to indicate that the region of the disk image has
> to be sufficiently allocated so it reads as zeroes. The call with the flag
> set has to return -ENOTSUP if allocation cannot be done efficiently
> (i.e. without falling back to writing actual buffers)
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

> +    /* allocation request with qiov provided doesn't make much sense */
> +    assert(!(qiov && flags & BDRV_REQ_ALLOCATE));

> +    assert(!(flags & BDRV_REQ_MAY_UNMAP && flags & BDRV_REQ_ALLOCATE));
> +
> +    if (flags & BDRV_REQ_ALLOCATE &&
> +        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))

I find it more readable with parentheses like this:

  assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
  assert(!((flags & BDRV_REQ_MAY_UNMAP) && (flags & BDRV_REQ_ALLOCATE)));
  if ((flags & BDRV_REQ_ALLOCATE) &&
      !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))

but your code is correct as it is.

Berto

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 05/15] file-posix: support BDRV_REQ_ALLOCATE
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 05/15] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2017-08-29 13:00   ` Alberto Garcia
  0 siblings, 0 replies; 27+ messages in thread
From: Alberto Garcia @ 2017-08-29 13:00 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: kwolf, den, qemu-block, mreitz

On Tue 01 Aug 2017 04:19:02 PM CEST, Anton Nefedov wrote:
> Current write_zeroes implementation is good enough to satisfy this flag too
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2017-08-29 13:07   ` Alberto Garcia
  0 siblings, 0 replies; 27+ messages in thread
From: Alberto Garcia @ 2017-08-29 13:07 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: kwolf, den, qemu-block, mreitz

On Tue 01 Aug 2017 04:19:03 PM CEST, Anton Nefedov wrote:
> Support the flag if the underlying BDS supports it
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements
  2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (14 preceding siblings ...)
  2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 15/15] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
@ 2017-09-21 17:16 ` Anton Nefedov
  15 siblings, 0 replies; 27+ messages in thread
From: Anton Nefedov @ 2017-09-21 17:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, den, kwolf, mreitz, eblake


On 1/8/2017 5:18 PM, Anton Nefedov wrote:
> 
> This pull request is to address a few performance problems of qcow2 format:
> 
>    1. non cluster-aligned write requests (to unallocated clusters) explicitly
>      pad data with zeroes if there is no backing data. This can be avoided
>      and the whole clusters are preallocated and zeroed in a single
>      efficient write_zeroes() operation, also providing better host file
>      continuity
> 
>    2. moreover, efficient write_zeroes() operation can be used to preallocate
>      space megabytes ahead which gives noticeable improvement on some storage
>      types (e.g. distributed storages where space allocation operation is
>      expensive)
> 

ping?
I'd send the rebased series but there are no real conflicts so

/thanks,
Anton

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-09-21 17:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 14:18 [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 01/15] mirror: inherit supported write/zero flags Anton Nefedov
2017-08-04 15:13   ` [Qemu-devel] [PATCH v4 for-2.10 " Eric Blake
2017-08-01 14:18 ` [Qemu-devel] [PATCH v4 02/15] blkverify: set " Anton Nefedov
2017-08-04 15:23   ` Eric Blake
2017-08-07 12:19     ` Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
2017-08-04 19:51   ` Eric Blake
2017-08-29 12:47   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 04/15] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 05/15] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
2017-08-29 13:00   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
2017-08-29 13:07   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 07/15] qcow2: preallocation at image expand Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 08/15] qcow2: set inactive flag Anton Nefedov
2017-08-04 20:00   ` Eric Blake
2017-08-07 12:06     ` Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 09/15] qcow2: truncate preallocated space Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 10/15] qcow2: check space leak at the end of the image Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 11/15] qcow2: move is_zero() up Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 12/15] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 13/15] qcow2: allocate image space by-cluster Anton Nefedov
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 14/15] iotest 190: test BDRV_REQ_ALLOCATE Anton Nefedov
2017-08-04 15:27   ` Eric Blake
2017-08-01 14:19 ` [Qemu-devel] [PATCH v4 15/15] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
2017-09-21 17:16 ` [Qemu-devel] [PATCH v4 00/15] qcow2: space preallocation and COW improvements Anton Nefedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.