All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements
@ 2017-11-01 15:43 Anton Nefedov
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 01/15] mirror: inherit supported write/zero flags Anton Nefedov
                   ` (15 more replies)
  0 siblings, 16 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

v5: rebased, patches slightly reordered;
    cluster allocation (to avoid writing zero-buffers)
    now goes before an optional ahead preallocation

v4: http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg00109.html

This pull request is to improve a few performance problems of qcow2 format:

  1. non cluster-aligned write requests (to unallocated clusters) explicitly
     pad data with zeroes if there is no backing data.
     Resulting increase in ops number and potential cluster fragmentation
     (on the host file) is already solved by:
       ee22a9d qcow2: Merge the writing of the COW regions with the guest data
     However, in case of zero COW regions, that can be avoided at all
     but the whole clusters are preallocated and zeroed in a single
     efficient write_zeroes() operation

  2. moreover, efficient write_zeroes() operation can be used to preallocate
     space megabytes (*configurable number) ahead which gives noticeable
     improvement on some storage types (e.g. distributed storage)
     where the space allocation operation might be expensive)

  3. this will also allow to enable simultaneous writes to the same unallocated
     cluster after the space has been allocated & zeroed but before
     the first data is written and the cluster is linked to L2.
     (Not included in this patchset).

Efficient write_zeroes usually implies that the blocks are not actually
written to but only reserved and marked as zeroed by the storage.
In this patchset, file-posix driver is marked as supporting this operation
if it supports (/configured to support) fallocate() operation.

Existing bdrv_write_zeroes() falls back to writing zero buffers if
write_zeroes is not supported by the driver.
A new flag (BDRV_REQ_ALLOCATE) is introduced to avoid that but return ENOTSUP.
Such allocate requests are also implemented to possibly overlap with the
other requests. No wait is performed but an error returned in such case as well.
So the operation should be considered advisory and a fallback scenario still
handled by the caller (in this case, qcow2 driver).

Anton Nefedov (12):
  mirror: inherit supported write/zero flags
  blkverify: set supported write/zero flags
  block: introduce BDRV_REQ_ALLOCATE flag
  block: treat BDRV_REQ_ALLOCATE as serialising
  file-posix: support BDRV_REQ_ALLOCATE
  block: support BDRV_REQ_ALLOCATE in passthrough drivers
  qcow2: move is_zero() up
  qcow2: skip writing zero buffers to empty COW areas
  qcow2: set inactive flag
  qcow2: do not zero out clusters if already preallocated
  iotest 198: test BDRV_REQ_ALLOCATE
  iotest 134: test cluster-misaligned encrypted write

Denis V. Lunev (2):
  qcow2: preallocation at image expand
  qcow2: truncate preallocated space

Pavel Butsykin (1):
  qcow2: check space leak at the end of the image

 block/qcow2.h                      |  18 ++++
 include/block/block.h              |   6 +-
 include/block/block_int.h          |   2 +-
 block/blkdebug.c                   |   3 +-
 block/blkverify.c                  |   9 ++
 block/file-posix.c                 |   8 ++
 block/io.c                         |  47 +++++++--
 block/mirror.c                     |   5 +
 block/qcow2-cluster.c              |  14 ++-
 block/qcow2-refcount.c             |   7 ++
 block/qcow2.c                      | 202 +++++++++++++++++++++++++++++++++----
 block/raw-format.c                 |   3 +-
 block/trace-events                 |   1 +
 qemu-options.hx                    |   4 +
 tests/qemu-iotests/026.out         |  94 +++++++++++++----
 tests/qemu-iotests/026.out.nocache |  94 +++++++++++++----
 tests/qemu-iotests/029.out         |   5 +-
 tests/qemu-iotests/060             |   2 +-
 tests/qemu-iotests/060.out         |  13 ++-
 tests/qemu-iotests/061.out         |   5 +-
 tests/qemu-iotests/066             |   2 +-
 tests/qemu-iotests/066.out         |   9 +-
 tests/qemu-iotests/098.out         |   7 +-
 tests/qemu-iotests/108.out         |   5 +-
 tests/qemu-iotests/112.out         |   5 +-
 tests/qemu-iotests/134             |   9 ++
 tests/qemu-iotests/134.out         |  10 ++
 tests/qemu-iotests/198             | 146 +++++++++++++++++++++++++++
 tests/qemu-iotests/198.out         |  50 +++++++++
 tests/qemu-iotests/group           |   1 +
 30 files changed, 693 insertions(+), 93 deletions(-)
 create mode 100755 tests/qemu-iotests/198
 create mode 100644 tests/qemu-iotests/198.out

-- 
2.7.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 01/15] mirror: inherit supported write/zero flags
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
@ 2017-11-01 15:43 ` Anton Nefedov
  2018-01-11 14:14   ` Alberto Garcia
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 02/15] blkverify: set " Anton Nefedov
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index 307b639..19c8987 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1064,6 +1064,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
     bdrv_refresh_filename(bs->backing->bs);
     pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
             bs->backing->bs->filename);
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->backing->bs->supported_write_flags;
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->backing->bs->supported_zero_flags;
 }
 
 static void bdrv_mirror_top_close(BlockDriverState *bs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 02/15] blkverify: set supported write/zero flags
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 01/15] mirror: inherit supported write/zero flags Anton Nefedov
@ 2017-11-01 15:43 ` Anton Nefedov
  2018-01-11 14:17   ` Alberto Garcia
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/blkverify.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 06369f9..9ba65d0 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -140,6 +140,15 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags &
+        s->test_file->bs->supported_write_flags;
+
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags &
+        s->test_file->bs->supported_zero_flags;
+
     ret = 0;
 fail:
     qemu_opts_del(opts);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 03/15] block: introduce BDRV_REQ_ALLOCATE flag
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 01/15] mirror: inherit supported write/zero flags Anton Nefedov
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 02/15] blkverify: set " Anton Nefedov
@ 2017-11-01 15:43 ` Anton Nefedov
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 04/15] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

The flag is supposed to indicate that the region of the disk image has
to be sufficiently allocated so it reads as zeroes.

The call with the flag set must return -ENOTSUP if allocation cannot
be done efficiently.
This has to be made sure of by both
  - the drivers that support the flag
  - and the common block layer (so it will not fall back to any slowpath
    (like writing zero buffers) in case the driver does not support
    the flag).

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 include/block/block.h     |  6 +++++-
 include/block/block_int.h |  2 +-
 block/io.c                | 20 +++++++++++++++++---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index fbc21da..a291db9 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -65,9 +65,13 @@ typedef enum {
     BDRV_REQ_NO_SERIALISING     = 0x8,
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
+    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
+     * efficiently allocate the space so it reads as zeroes, or return an error.
+     */
+    BDRV_REQ_ALLOCATE           = 0x40,
 
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x3f,
+    BDRV_REQ_MASK               = 0x7f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a548277..3799f5f 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -632,7 +632,7 @@ struct BlockDriverState {
     /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
     unsigned int supported_write_flags;
     /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
-     * BDRV_REQ_MAY_UNMAP) */
+     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
     unsigned int supported_zero_flags;
 
     /* the following member gives a name to every node on the bs graph. */
diff --git a/block/io.c b/block/io.c
index 3d5ef2c..7506207 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1334,7 +1334,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
             assert(!bs->supported_zero_flags);
         }
 
-        if (ret == -ENOTSUP) {
+        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
             /* Fall back to bounce buffer if write zeroes is unsupported */
             BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
 
@@ -1420,8 +1420,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
     if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
-        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
-        qemu_iovec_is_zero(qiov)) {
+        !(flags & BDRV_REQ_ZERO_WRITE) && !(flags & BDRV_REQ_ALLOCATE) &&
+        drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {
         flags |= BDRV_REQ_ZERO_WRITE;
         if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
             flags |= BDRV_REQ_MAY_UNMAP;
@@ -1499,6 +1499,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
     assert(flags & BDRV_REQ_ZERO_WRITE);
     if (head_padding_bytes || tail_padding_bytes) {
+        if (flags & BDRV_REQ_ALLOCATE) {
+            return -ENOTSUP;
+        }
         buf = qemu_blockalign(bs, align);
         iov = (struct iovec) {
             .iov_base   = buf,
@@ -1599,6 +1602,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         return ret;
     }
 
+    /* allocation request with qiov provided doesn't make much sense */
+    assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
+
     bdrv_inc_in_flight(bs);
     /*
      * Align write if necessary by performing a read-modify-write cycle.
@@ -1728,6 +1734,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
 {
     trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
 
+    assert(!((flags & BDRV_REQ_MAY_UNMAP) && (flags & BDRV_REQ_ALLOCATE)));
+
+    if ((flags & BDRV_REQ_ALLOCATE) &&
+        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
+    {
+        return -ENOTSUP;
+    }
+
     if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
         flags &= ~BDRV_REQ_MAY_UNMAP;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 04/15] block: treat BDRV_REQ_ALLOCATE as serialising
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (2 preceding siblings ...)
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2017-11-01 15:43 ` Anton Nefedov
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 05/15] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

The idea is that ALLOCATE requests may overlap with other requests.
Reuse the existing block layer infrastructure for serialising requests.
Use the following approach:
  - mark ALLOCATE serialising, so subsequent requests to the area wait
  - ALLOCATE request itself must never wait if another request is in flight
    already. Return EAGAIN, let the caller reconsider.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/io.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index 7506207..1e07a0b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -535,7 +535,8 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
     bdrv_wakeup(bs);
 }
 
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
+                                                   bool nowait)
 {
     BlockDriverState *bs = self->bs;
     BdrvTrackedRequest *req;
@@ -566,11 +567,14 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
                  * will wait for us as soon as it wakes up, then just go on
                  * (instead of producing a deadlock in the former case). */
                 if (!req->waiting_for) {
+                    waited = true;
+                    if (nowait) {
+                        break;
+                    }
                     self->waiting_for = req;
                     qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
                     self->waiting_for = NULL;
                     retry = true;
-                    waited = true;
                     break;
                 }
             }
@@ -1120,7 +1124,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
     }
 
     if (!(flags & BDRV_REQ_NO_SERIALISING)) {
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
     }
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1410,7 +1414,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
-    waited = wait_serialising_requests(req);
+    waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
+    if (waited && flags & BDRV_REQ_ALLOCATE) {
+        return -EAGAIN;
+    }
     assert(!waited || !req->serialising);
     assert(req->overlap_offset <= offset);
     assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
@@ -1514,7 +1521,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
         /* RMW the unaligned part before head. */
         mark_request_serialising(req, align);
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
         bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_HEAD);
         ret = bdrv_aligned_preadv(child, req, offset & ~(align - 1), align,
                                   align, &local_qiov, 0);
@@ -1534,6 +1541,10 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
         bytes -= zero_bytes;
     }
 
+    if (flags & BDRV_REQ_ALLOCATE) {
+        mark_request_serialising(req, align);
+    }
+
     assert(!bytes || (offset & (align - 1)) == 0);
     if (bytes >= align) {
         /* Write the aligned part in the middle. */
@@ -1552,7 +1563,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
         assert(align == tail_padding_bytes + bytes);
         /* RMW the unaligned part after tail. */
         mark_request_serialising(req, align);
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
         bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_TAIL);
         ret = bdrv_aligned_preadv(child, req, offset, align,
                                   align, &local_qiov, 0);
@@ -1623,7 +1634,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         struct iovec head_iov;
 
         mark_request_serialising(&req, align);
-        wait_serialising_requests(&req);
+        wait_serialising_requests(&req, false);
 
         head_buf = qemu_blockalign(bs, align);
         head_iov = (struct iovec) {
@@ -1664,7 +1675,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         bool waited;
 
         mark_request_serialising(&req, align);
-        waited = wait_serialising_requests(&req);
+        waited = wait_serialising_requests(&req, false);
         assert(!waited || !use_local_qiov);
 
         tail_buf = qemu_blockalign(bs, align);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 05/15] file-posix: support BDRV_REQ_ALLOCATE
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (3 preceding siblings ...)
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 04/15] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2017-11-01 15:43 ` Anton Nefedov
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Current write_zeroes implementation is good enough to satisfy this flag too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/file-posix.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 36ee89e..c36e156 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -558,7 +558,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
     if (S_ISREG(st.st_mode)) {
         s->discard_zeroes = true;
+#ifdef CONFIG_FALLOCATE
         s->has_fallocate = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
+#endif
     }
     if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -593,6 +596,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
         s->is_xfs = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
     }
 #endif
 
@@ -1413,6 +1417,10 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
         }
         s->has_fallocate = false;
     }
+
+    if (!s->has_fallocate) {
+        aiocb->bs->supported_zero_flags &= ~BDRV_REQ_ALLOCATE;
+    }
 #endif
 
     return -ENOTSUP;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (4 preceding siblings ...)
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 05/15] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2017-11-01 15:43 ` Anton Nefedov
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 07/15] qcow2: move is_zero() up Anton Nefedov
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Support the flag if the underlying BDS supports it

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/blkdebug.c   | 3 ++-
 block/blkverify.c  | 2 +-
 block/mirror.c     | 2 +-
 block/raw-format.c | 3 ++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index e216699..7d5773d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -400,7 +400,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
 
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
     ret = -EINVAL;
 
diff --git a/block/blkverify.c b/block/blkverify.c
index 9ba65d0..b249636 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -145,7 +145,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         s->test_file->bs->supported_write_flags;
 
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags &
         s->test_file->bs->supported_zero_flags;
 
diff --git a/block/mirror.c b/block/mirror.c
index 19c8987..404c3e9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1067,7 +1067,7 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->backing->bs->supported_write_flags;
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->backing->bs->supported_zero_flags;
 }
 
diff --git a/block/raw-format.c b/block/raw-format.c
index ab552c0..b1deb93 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -416,7 +416,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
     bs->sg = bs->file->bs->sg;
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
 
     if (bs->probed && !bdrv_is_read_only(bs)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 07/15] qcow2: move is_zero() up
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (5 preceding siblings ...)
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2018-01-15 15:14   ` Alberto Garcia
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

To be used in the following commit without a forward declaration.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 92cb9f9..db759ce 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1881,6 +1881,23 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
     return false;
 }
 
+static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+    int64_t nr;
+    int res;
+
+    /* Clamp to image length, before checking status of underlying sectors */
+    if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
+        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - offset;
+    }
+
+    if (!bytes) {
+        return true;
+    }
+    res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -2998,24 +3015,6 @@ finish:
     return ret;
 }
 
-
-static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
-{
-    int64_t nr;
-    int res;
-
-    /* Clamp to image length, before checking status of underlying sectors */
-    if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
-        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - offset;
-    }
-
-    if (!bytes) {
-        return true;
-    }
-    res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
-}
-
 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (6 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 07/15] qcow2: move is_zero() up Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2018-01-15 15:31   ` Alberto Garcia
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 09/15] qcow2: preallocation at image expand Anton Nefedov
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

If COW areas of the newly allocated clusters are zeroes on the backing image,
efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
cluster instead of writing explicit zero buffers later in perform_cow().

iotest 060:
write to the discarded cluster does not trigger COW anymore.
so, break on write_aio event instead, will work for the test
(but write won't fail anymore, so update reference output)

iotest 066:
cluster-alignment areas that were not really COWed are now detected
as zeroes, hence the initial write has to be exactly the same size for
the maps to match

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.h              |  6 +++++
 block/qcow2-cluster.c      |  3 ++-
 block/qcow2.c              | 67 ++++++++++++++++++++++++++++++++++++++++++++--
 block/trace-events         |  1 +
 tests/qemu-iotests/060     |  2 +-
 tests/qemu-iotests/060.out |  3 ++-
 tests/qemu-iotests/066     |  2 +-
 tests/qemu-iotests/066.out |  4 +--
 8 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 782a206..e312e48 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -377,6 +377,12 @@ typedef struct QCowL2Meta
     Qcow2COWRegion cow_end;
 
     /**
+     * Indicates that both COW areas are empty (nb_bytes == 0)
+     * or filled with zeroes and do not require any more copying
+     */
+    bool zero_cow;
+
+    /**
      * The I/O vector with the data from the actual guest write request.
      * If non-NULL, this is meant to be merged together with the data
      * from @cow_start and @cow_end into one single write operation.
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index fb10e26..e364116 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -783,7 +783,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->offset + start->nb_bytes <= end->offset);
     assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
-    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
+    if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->zero_cow) {
         return 0;
     }
 
@@ -1383,6 +1383,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
             .offset     = nb_bytes,
             .nb_bytes   = avail_bytes - nb_bytes,
         },
+        .zero_cow = false,
     };
     qemu_co_queue_init(&(*m)->dependent_requests);
     QLIST_INSERT_HEAD(&s->cluster_allocs, *m, next_in_flight);
diff --git a/block/qcow2.c b/block/qcow2.c
index db759ce..827058d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1856,6 +1856,11 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
             continue;
         }
 
+        /* If both COW regions are zeroes already, skip this too */
+        if (m->zero_cow) {
+            continue;
+        }
+
         /* The data (middle) region must be immediately after the
          * start region */
         if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
@@ -1898,6 +1903,57 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
     return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
 }
 
+static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
+{
+    if (bs->encrypted) {
+        return false;
+    }
+
+    if (m->cow_start.nb_bytes != 0 &&
+        !is_zero(bs, m->offset + m->cow_start.offset, m->cow_start.nb_bytes))
+    {
+        return false;
+    }
+
+    if (m->cow_end.nb_bytes != 0 &&
+        !is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes))
+    {
+        return false;
+    }
+
+    return true;
+}
+
+static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCowL2Meta *m;
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        int ret;
+
+        if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
+            continue;
+        }
+
+        if (!is_zero_cow(bs, m)) {
+            continue;
+        }
+
+        /* instead of writing zero COW buffers,
+           efficiently zero out the whole clusters */
+        ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
+                                    m->nb_clusters * s->cluster_size,
+                                    BDRV_REQ_ALLOCATE);
+        if (ret < 0) {
+            continue;
+        }
+
+        trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
+        m->zero_cow = true;
+    }
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -1980,24 +2036,31 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
+        qemu_co_mutex_unlock(&s->lock);
+
+        if (bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE) {
+            handle_alloc_space(bs, l2meta);
+        }
+
         /* If we need to do COW, check if it's possible to merge the
          * writing of the guest data together with that of the COW regions.
          * If it's not possible (or not necessary) then write the
          * guest data now. */
         if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
-            qemu_co_mutex_unlock(&s->lock);
             BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
             trace_qcow2_writev_data(qemu_coroutine_self(),
                                     cluster_offset + offset_in_cluster);
             ret = bdrv_co_pwritev(bs->file,
                                   cluster_offset + offset_in_cluster,
                                   cur_bytes, &hd_qiov, 0);
-            qemu_co_mutex_lock(&s->lock);
             if (ret < 0) {
+                qemu_co_mutex_lock(&s->lock);
                 goto fail;
             }
         }
 
+        qemu_co_mutex_lock(&s->lock);
+
         while (l2meta != NULL) {
             QCowL2Meta *next;
 
diff --git a/block/trace-events b/block/trace-events
index 11c8d5f..c9fa596 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -61,6 +61,7 @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
 qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
 qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
 qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
+qcow2_skip_cow(void* co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"
 
 # block/qcow2-cluster.c
 qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 8e95c45..3a0f096 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -160,7 +160,7 @@ poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
 # any unallocated cluster, leading to an attempt to overwrite the second L2
 # table. Finally, resume the COW write and see it fail (but not crash).
 echo "open -o file.driver=blkdebug $TEST_IMG
-break cow_read 0
+break write_aio 0
 aio_write 0k 1k
 wait_break 0
 write 64k 64k
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 5ca3af4..736dc30 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -108,7 +108,8 @@ qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps
 blkdebug: Suspended request '0'
 write failed: Input/output error
 blkdebug: Resuming request '0'
-aio_write failed: No medium found
+wrote 1024/1024 bytes at offset 0
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 === Testing unallocated image header ===
 
diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
index 8638217..3c216a1 100755
--- a/tests/qemu-iotests/066
+++ b/tests/qemu-iotests/066
@@ -71,7 +71,7 @@ echo
 _make_test_img $IMG_SIZE
 
 # Create data clusters (not aligned to an L2 table)
-$QEMU_IO -c 'write -P 42 1M 256k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -P 42 $(((1024 + 32) * 1024)) 192k" "$TEST_IMG" | _filter_qemu_io
 orig_map=$($QEMU_IMG map --output=json "$TEST_IMG")
 
 # Convert the data clusters to preallocated zero clusters
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index 3d9da9b..093431e 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -19,8 +19,8 @@ Offset          Length          Mapped to       File
 === Writing to preallocated zero clusters ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67109376
-wrote 262144/262144 bytes at offset 1048576
-256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 1081344
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 262144/262144 bytes at offset 1048576
 256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 196608/196608 bytes at offset 1081344
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 09/15] qcow2: preallocation at image expand
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (7 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 10/15] qcow2: set inactive flag Anton Nefedov
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Denis V. Lunev,
	Anton Nefedov

From: "Denis V. Lunev" <den@openvz.org>

This patch adds image preallocation at expand to provide better locality
of QCOW2 image file and optimize this procedure for some distributed
storage where this procedure is slow.

Preallocation is not issued upon writing metadata clusters.

Possible conflicts are resolved by the common block layer code since
ALLOCATE requests are serialising.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.h   |  3 +++
 block/qcow2.c   | 43 +++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx |  4 ++++
 3 files changed, 50 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index e312e48..9a2711e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -102,6 +102,7 @@
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
 #define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
+#define QCOW2_OPT_PREALLOC_SIZE "prealloc-size"
 
 typedef struct QCowHeader {
     uint32_t magic;
@@ -327,6 +328,8 @@ typedef struct BDRVQcow2State {
      * override) */
     char *image_backing_file;
     char *image_backing_format;
+
+    uint64_t prealloc_size;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
diff --git a/block/qcow2.c b/block/qcow2.c
index 827058d..3899524 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -674,6 +674,11 @@ static QemuOptsList qcow2_runtime_opts = {
         },
         BLOCK_CRYPTO_OPT_DEF_KEY_SECRET("encrypt.",
             "ID of secret providing qcow2 AES key or LUKS passphrase"),
+        {
+            .name = QCOW2_OPT_PREALLOC_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "Preallocation amount at image expand",
+        },
         { /* end of list */ }
     },
 };
@@ -1016,6 +1021,15 @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
         goto fail;
     }
 
+    s->prealloc_size =
+        ROUND_UP(qemu_opt_get_size_del(opts, QCOW2_OPT_PREALLOC_SIZE, 0),
+                 s->cluster_size);
+    if (s->prealloc_size &&
+        !(bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
+    {
+        s->prealloc_size = 0;
+    }
+
     ret = 0;
 fail:
     QDECREF(encryptopts);
@@ -1924,6 +1938,31 @@ static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
     return true;
 }
 
+/*
+ * If the specified area is beyond EOF, allocates it + prealloc_size
+ * bytes ahead.
+ */
+static void coroutine_fn handle_prealloc(BlockDriverState *bs,
+                                         const QCowL2Meta *m)
+{
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t start = m->alloc_offset;
+    uint64_t end = start + m->nb_clusters * s->cluster_size;
+    int64_t flen = bdrv_getlength(bs->file->bs);
+
+    if (flen < 0) {
+        return;
+    }
+
+    if (end > flen) {
+        /* try to alloc host space in one chunk for better locality */
+        bdrv_co_pwrite_zeroes(bs->file, flen,
+                              QEMU_ALIGN_UP(end + s->prealloc_size - flen,
+                                            s->cluster_size),
+                              BDRV_REQ_ALLOCATE);
+    }
+}
+
 static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 {
     BDRVQcow2State *s = bs->opaque;
@@ -1932,6 +1971,10 @@ static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
     for (m = l2meta; m != NULL; m = m->next) {
         int ret;
 
+        if (s->prealloc_size) {
+            handle_prealloc(bs, m);
+        }
+
         if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
             continue;
         }
diff --git a/qemu-options.hx b/qemu-options.hx
index 3728e9b..a75a9fd 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -771,6 +771,10 @@ occasions where a cluster gets freed (on/off; default: off)
 Which overlap checks to perform for writes to the image
 (none/constant/cached/all; default: cached). For details or finer
 granularity control refer to the QAPI documentation of @code{blockdev-add}.
+
+@item prealloc-size
+The number of bytes that will be preallocated ahead at qcow2 file expansion
+(allocating a new cluster beyond the end of file).
 @end table
 
 Example 1:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 10/15] qcow2: set inactive flag
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (8 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 09/15] qcow2: preallocation at image expand Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 11/15] qcow2: truncate preallocated space Anton Nefedov
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Qcow2State and BlockDriverState flags have to be in sync

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 3899524..f41aaac 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2184,6 +2184,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
 
     if (result == 0) {
         qcow2_mark_clean(bs);
+        s->flags |= BDRV_O_INACTIVE;
     }
 
     return result;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 11/15] qcow2: truncate preallocated space
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (9 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 10/15] qcow2: set inactive flag Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 12/15] qcow2: check space leak at the end of the image Anton Nefedov
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Denis V. Lunev,
	Anton Nefedov

From: "Denis V. Lunev" <den@openvz.org>

This could be done after calculation of the end of data and metadata in
the qcow2 image.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.h          | 3 +++
 block/qcow2-cluster.c  | 9 +++++++++
 block/qcow2-refcount.c | 7 +++++++
 block/qcow2.c          | 7 +++++++
 4 files changed, 26 insertions(+)

diff --git a/block/qcow2.h b/block/qcow2.h
index 9a2711e..579838d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -330,6 +330,7 @@ typedef struct BDRVQcow2State {
     char *image_backing_format;
 
     uint64_t prealloc_size;
+    uint64_t data_end;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
@@ -693,4 +694,6 @@ void qcow2_remove_persistent_dirty_bitmap(BlockDriverState *bs,
                                           const char *name,
                                           Error **errp);
 
+void qcow2_update_data_end(BlockDriverState *bs, uint64_t off);
+
 #endif
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e364116..9b24cab 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -2080,3 +2080,12 @@ fail:
     g_free(l1_table);
     return ret;
 }
+
+void qcow2_update_data_end(BlockDriverState *bs, uint64_t off)
+{
+    BDRVQcow2State *s = bs->opaque;
+
+    if (s->data_end < off) {
+        s->data_end = off;
+    }
+}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index aa3fd6c..5797666 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -834,6 +834,9 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
             ret = alloc_refcount_block(bs, cluster_index, &refcount_block);
             if (ret < 0) {
                 goto fail;
+            } else {
+                qcow2_update_data_end(bs, s->refcount_table_offset +
+                        s->refcount_table_size * sizeof(uint64_t));
             }
         }
         old_table_index = table_index;
@@ -971,6 +974,8 @@ retry:
         s->free_cluster_index - 1 > (INT64_MAX >> s->cluster_bits))
     {
         return -EFBIG;
+    } else {
+        qcow2_update_data_end(bs, s->free_cluster_index << s->cluster_bits);
     }
 
 #ifdef DEBUG_ALLOC2
@@ -1035,6 +1040,8 @@ int64_t qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset,
 
     if (ret < 0) {
         return ret;
+    } else {
+        qcow2_update_data_end(bs, offset + (nb_clusters << s->cluster_bits));
     }
 
     return i;
diff --git a/block/qcow2.c b/block/qcow2.c
index f41aaac..5e2b543 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1488,6 +1488,8 @@ static int qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
         }
     }
 
+    s->data_end = bdrv_getlength(bs->file->bs);
+
 #ifdef DEBUG_ALLOC
     {
         BdrvCheckResult result = {0};
@@ -2184,6 +2186,11 @@ static int qcow2_inactivate(BlockDriverState *bs)
 
     if (result == 0) {
         qcow2_mark_clean(bs);
+
+        /* truncate preallocated space */
+        if (!bs->read_only && s->data_end < bdrv_getlength(bs->file->bs)) {
+            bdrv_truncate(bs->file, s->data_end, PREALLOC_MODE_OFF, NULL);
+        }
         s->flags |= BDRV_O_INACTIVE;
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 12/15] qcow2: check space leak at the end of the image
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (10 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 11/15] qcow2: truncate preallocated space Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 13/15] qcow2: do not zero out clusters if already preallocated Anton Nefedov
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Pavel Butsykin,
	Denis V . Lunev, Anton Nefedov

From: Pavel Butsykin <pbutsykin@virtuozzo.com>

Preallocated space in the image may remain unused; the patch adds
the functionality to identify and fix it in the qcow2_check
to avoid wasting storage space on the host.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.c                      | 32 +++++++++++++
 tests/qemu-iotests/026.out         | 94 +++++++++++++++++++++++++++++---------
 tests/qemu-iotests/026.out.nocache | 94 +++++++++++++++++++++++++++++---------
 tests/qemu-iotests/029.out         |  5 +-
 tests/qemu-iotests/060.out         | 10 +++-
 tests/qemu-iotests/061.out         |  5 +-
 tests/qemu-iotests/066.out         |  5 +-
 tests/qemu-iotests/098.out         |  7 ++-
 tests/qemu-iotests/108.out         |  5 +-
 tests/qemu-iotests/112.out         |  5 +-
 10 files changed, 210 insertions(+), 52 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5e2b543..ef65b5f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -530,6 +530,33 @@ int qcow2_mark_consistent(BlockDriverState *bs)
     return 0;
 }
 
+static int qcow2_check_extra_preallocation(BlockDriverState *bs,
+    BdrvCheckResult *res, BdrvCheckMode fix)
+{
+    BDRVQcow2State *s = bs->opaque;
+    uint64_t img_size = bdrv_getlength(bs->file->bs);
+
+    if (res->image_end_offset < img_size) {
+        uint64_t count =
+            DIV_ROUND_UP(img_size - res->image_end_offset, s->cluster_size);
+        fprintf(stderr, "%s space leaked at the end of the image %jd\n",
+                fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
+                img_size - res->image_end_offset);
+        res->leaks += count;
+        if (fix & BDRV_FIX_LEAKS) {
+            int ret = bdrv_truncate(bs->file, res->image_end_offset,
+                                    PREALLOC_MODE_OFF, NULL);
+            if (ret < 0) {
+                res->check_errors++;
+                return ret;
+            }
+            res->leaks_fixed += count;
+        }
+    }
+
+    return 0;
+}
+
 static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
                        BdrvCheckMode fix)
 {
@@ -538,6 +565,11 @@ static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
         return ret;
     }
 
+    ret = qcow2_check_extra_preallocation(bs, result, fix);
+    if (ret < 0) {
+        return ret;
+    }
+
     if (fix && result->check_errors == 0 && result->corruptions == 0) {
         ret = qcow2_mark_clean(bs);
         if (ret < 0) {
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index 86a50a2..29cc41e 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -5,7 +5,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write -b
@@ -33,7 +36,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write -b
@@ -468,20 +474,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 5120
 
-55 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -489,19 +502,26 @@ Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 265728
 
-251 leaked clusters were found on the image.
+770 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write
@@ -520,20 +540,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 27648
 
-11 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -541,27 +568,35 @@ Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 21504
 
-23 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-11 leaked clusters were found on the image.
+268 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -569,27 +604,35 @@ Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-23 leaked clusters were found on the image.
+280 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-11 leaked clusters were found on the image.
+270 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -597,8 +640,9 @@ Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-23 leaked clusters were found on the image.
+282 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 
 === L1 growth tests ===
@@ -651,7 +695,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: on
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: off
@@ -665,7 +712,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: on
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: off
diff --git a/tests/qemu-iotests/026.out.nocache b/tests/qemu-iotests/026.out.nocache
index ea2e166..42b258b 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -5,7 +5,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 5; imm: off; once: on; write 
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 5; imm: off; once: on; write -b
@@ -33,7 +36,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_update; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 1024
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 28; imm: off; once: on; write -b
@@ -476,20 +482,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 5120
 
-55 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -497,19 +510,26 @@ Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 265728
 
-251 leaked clusters were found on the image.
+770 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 394240
+
+770 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write 
@@ -528,20 +548,27 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 27648
 
-11 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -549,27 +576,35 @@ Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 21504
 
-23 leaked clusters were found on the image.
+65 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-11 leaked clusters were found on the image.
+268 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -577,27 +612,35 @@ Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 131584
 
-23 leaked clusters were found on the image.
+280 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write 
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: on; write -b
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 33280
+
+65 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write 
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-11 leaked clusters were found on the image.
+270 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -605,8 +648,9 @@ Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write -b
 Failed to flush the L2 table cache: No space left on device
 Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
+ERROR space leaked at the end of the image 132608
 
-23 leaked clusters were found on the image.
+282 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 
 === L1 growth tests ===
@@ -659,7 +703,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: on
 write failed: Input/output error
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: off
@@ -673,7 +720,10 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: on
 write failed: No space left on device
-No errors were found on the image.
+ERROR space leaked at the end of the image 98304
+
+96 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: off
diff --git a/tests/qemu-iotests/029.out b/tests/qemu-iotests/029.out
index 5bc93e0..7176e38 100644
--- a/tests/qemu-iotests/029.out
+++ b/tests/qemu-iotests/029.out
@@ -6,7 +6,10 @@ is smaller than the current L1 table.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 wrote 4096/4096 bytes at offset 0
 4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 65536
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=16777216
 wrote 4194304/4194304 bytes at offset 0
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 736dc30..290ccec 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -78,20 +78,26 @@ Leaked cluster 9 refcount=1 reference=0
 Repairing cluster 4 refcount=1 reference=2
 Repairing cluster 9 refcount=1 reference=0
 Repairing OFLAG_COPIED data cluster: l2_entry=8000000000040000 refcount=2
+ERROR space leaked at the end of the image 65536
 The following inconsistencies were found and repaired:
 
     1 leaked clusters
     2 corruptions
 
 Double checking the fixed image now...
-No errors were found on the image.
+
+1 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 incompatible_features     0x0
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 incompatible_features     0x0
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 131072
+
+2 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 read 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 942485d..87aaebf 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -328,7 +328,10 @@ wrote 131072/131072 bytes at offset 0
 No errors were found on the image.
 read 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 196608
+
+3 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 read 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index 093431e..81ef795 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -13,7 +13,10 @@ discard 67109376/67109376 bytes at offset 0
 64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 67109376/67109376 bytes at offset 0
 64 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-No errors were found on the image.
+ERROR space leaked at the end of the image 327680
+
+5 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Offset          Length          Mapped to       File
 
 === Writing to preallocated zero clusters ===
diff --git a/tests/qemu-iotests/098.out b/tests/qemu-iotests/098.out
index 7634d0e..733249f 100644
--- a/tests/qemu-iotests/098.out
+++ b/tests/qemu-iotests/098.out
@@ -20,7 +20,10 @@ Leaked cluster 4 refcount=1 reference=0
 Leaked cluster 5 refcount=1 reference=0
 Repairing cluster 4 refcount=1 reference=0
 Repairing cluster 5 refcount=1 reference=0
-No errors were found on the image.
+ERROR space leaked at the end of the image 131072
+
+2 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 
 === reftable_update ===
 
@@ -34,6 +37,7 @@ ERROR cluster 1 refcount=0 reference=1
 ERROR cluster 3 refcount=0 reference=1
 Rebuilding refcount structure
 Repairing cluster 1 refcount=1 reference=0
+Repairing space leaked at the end of the image 65536
 No errors were found on the image.
 
 === refblock_alloc ===
@@ -48,5 +52,6 @@ ERROR cluster 1 refcount=0 reference=1
 ERROR cluster 3 refcount=0 reference=1
 Rebuilding refcount structure
 Repairing cluster 1 refcount=1 reference=0
+Repairing space leaked at the end of the image 65536
 No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/108.out b/tests/qemu-iotests/108.out
index 75bab8d..d178b1e 100644
--- a/tests/qemu-iotests/108.out
+++ b/tests/qemu-iotests/108.out
@@ -68,13 +68,16 @@ Rebuilding refcount structure
 Repairing cluster 1 refcount=1 reference=0
 Repairing cluster 2 refcount=1 reference=0
 Repairing cluster 16 refcount=1 reference=0
+ERROR space leaked at the end of the image 720896
 The following inconsistencies were found and repaired:
 
     0 leaked clusters
     2 corruptions
 
 Double checking the fixed image now...
-No errors were found on the image.
+
+11 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 
 --- Signed overflow after the refblock ---
 
diff --git a/tests/qemu-iotests/112.out b/tests/qemu-iotests/112.out
index 81b04d1..7386be8 100644
--- a/tests/qemu-iotests/112.out
+++ b/tests/qemu-iotests/112.out
@@ -93,7 +93,10 @@ refcount bits: 1
 
 === Amend from refcount_bits=1 to refcount_bits=64 ===
 
-No errors were found on the image.
+ERROR space leaked at the end of the image 131072
+
+2 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 refcount bits: 64
 
 === Amend to compat=0.10 ===
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 13/15] qcow2: do not zero out clusters if already preallocated
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (11 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 12/15] qcow2: check space leak at the end of the image Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 14/15] iotest 198: test BDRV_REQ_ALLOCATE Anton Nefedov
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

if space preallocation feature is used, it can be detected that the space
is already zeroes, so COW can be skipped without additional zeroing of
the clusters.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/qcow2.h         |  6 ++++++
 block/qcow2-cluster.c |  2 ++
 block/qcow2.c         | 45 +++++++++++++++++++++++++++++++--------------
 3 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 579838d..5d96748 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -363,6 +363,12 @@ typedef struct QCowL2Meta
     bool keep_old_clusters;
 
     /**
+     * True if the area is allocated at the end of data area
+     * (i.e. >= BDRVQcow2State::data_end)
+     */
+    bool clusters_are_trailing;
+
+    /**
      * Requests that overlap with this allocation and wait to be restarted
      * when the allocating request has completed.
      */
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 9b24cab..5698919 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1251,6 +1251,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
     uint64_t *host_offset, uint64_t *bytes, QCowL2Meta **m)
 {
     BDRVQcow2State *s = bs->opaque;
+    const uint64_t old_data_end = s->data_end;
     int l2_index;
     uint64_t *l2_table;
     uint64_t entry;
@@ -1372,6 +1373,7 @@ static int handle_alloc(BlockDriverState *bs, uint64_t guest_offset,
         .alloc_offset   = alloc_cluster_offset,
         .offset         = start_of_cluster(s, guest_offset),
         .nb_clusters    = nb_clusters,
+        .clusters_are_trailing = alloc_cluster_offset >= old_data_end,
 
         .keep_old_clusters  = keep_old_clusters,
 
diff --git a/block/qcow2.c b/block/qcow2.c
index ef65b5f..6019cf4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1975,26 +1975,40 @@ static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
 /*
  * If the specified area is beyond EOF, allocates it + prealloc_size
  * bytes ahead.
+ *
+ * Returns
+ *   true if the space is allocated and contains zeroes
  */
-static void coroutine_fn handle_prealloc(BlockDriverState *bs,
+static bool coroutine_fn handle_prealloc(BlockDriverState *bs,
                                          const QCowL2Meta *m)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t start = m->alloc_offset;
     uint64_t end = start + m->nb_clusters * s->cluster_size;
+    int ret;
     int64_t flen = bdrv_getlength(bs->file->bs);
 
     if (flen < 0) {
-        return;
+        return false;
     }
 
     if (end > flen) {
         /* try to alloc host space in one chunk for better locality */
-        bdrv_co_pwrite_zeroes(bs->file, flen,
-                              QEMU_ALIGN_UP(end + s->prealloc_size - flen,
-                                            s->cluster_size),
-                              BDRV_REQ_ALLOCATE);
+        ret = bdrv_co_pwrite_zeroes(bs->file, flen,
+                                    QEMU_ALIGN_UP(end + s->prealloc_size - flen,
+                                                  s->cluster_size),
+                                    BDRV_REQ_ALLOCATE);
+        if (ret < 0) {
+            return false;
+        }
     }
+
+    /* We're safe to assume that the area is zeroes if the area
+     * was allocated at the end of data (s->data_end).
+     * In this case, the only way for file length to be bigger is that
+     * the area was preallocated by this or another request.
+     */
+    return m->clusters_are_trailing;
 }
 
 static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
@@ -2004,9 +2018,10 @@ static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
 
     for (m = l2meta; m != NULL; m = m->next) {
         int ret;
+        bool preallocated_zeroes = false;
 
         if (s->prealloc_size) {
-            handle_prealloc(bs, m);
+            preallocated_zeroes = handle_prealloc(bs, m);
         }
 
         if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
@@ -2017,13 +2032,15 @@ static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
             continue;
         }
 
-        /* instead of writing zero COW buffers,
-           efficiently zero out the whole clusters */
-        ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
-                                    m->nb_clusters * s->cluster_size,
-                                    BDRV_REQ_ALLOCATE);
-        if (ret < 0) {
-            continue;
+        if (!preallocated_zeroes) {
+            /* instead of writing zero COW buffers,
+               efficiently zero out the whole clusters */
+            ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
+                                        m->nb_clusters * s->cluster_size,
+                                        BDRV_REQ_ALLOCATE);
+            if (ret < 0) {
+                continue;
+            }
         }
 
         trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 14/15] iotest 198: test BDRV_REQ_ALLOCATE
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (12 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 13/15] qcow2: do not zero out clusters if already preallocated Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 15/15] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
  2017-12-04 22:32 ` [Qemu-devel] [Qemu-block] [PATCH v5 00/15] qcow2: space preallocation and COW improvements John Snow
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 tests/qemu-iotests/198     | 146 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/198.out |  50 ++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 197 insertions(+)
 create mode 100755 tests/qemu-iotests/198
 create mode 100644 tests/qemu-iotests/198.out

diff --git a/tests/qemu-iotests/198 b/tests/qemu-iotests/198
new file mode 100755
index 0000000..ad7162a
--- /dev/null
+++ b/tests/qemu-iotests/198
@@ -0,0 +1,146 @@
+#!/bin/env bash
+#
+# Test qcow2 BDRV_REQ_ALLOCATE requests
+#
+# Copyright (c) 2017 Parallels International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+##
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+function do_io()
+{
+    $QEMU_IO "$@" | _filter_qemu_io |\
+        sed -e 's/bytes at offset [0-9]*/bytes at offset XXX/g'
+}
+
+CLUSTER_SIZE=64k
+size=128M
+
+_make_test_img 1G
+
+echo
+echo "== Test discarded cluster reuse =="
+
+# allocate first two clusters
+do_io -c "writev -P 1 0x8000 0x10000" "$TEST_IMG"
+len=$(stat -c "%s" $TEST_IMG)
+
+# discard the 1st cluster on qcow2 level only
+do_io -c "open -o pass-discard-request=off $TEST_IMG" -c "discard 0 0x10000"
+
+# new write will reuse the dirty host cluster and has to overwrite that
+#  with zeroes
+do_io -c "writev -P 2 0x24000 0x8000" "$TEST_IMG"
+if [ $len -ne $(stat -c "%s" $TEST_IMG) ] ; then
+    >&2 echo "Failed to reuse cluster"
+    exit 1
+fi
+
+echo
+echo "== Test preallocation =="
+
+function io_commands()
+{
+    echo "open -o prealloc-size=$((1024*1024)) blkdebug::$TEST_IMG"
+
+    # Verify that intersections of a running preallocation and new requests
+    #  is handled properly.
+    #
+    # 1. send a write #1 which triggers preallocation, suspend it in action
+    # 2. send a write #2 which intersects with the area being preallocated
+    # 3. using break/wait_break/resume, wait until write #2 is at least
+    #    at WRITE_AIO tracepoint.
+    #    Then it is supposed to enter pwrite(bs->child) and start waiting
+    #    for #1 to finish
+    # 4. resume #1
+
+cat <<EOF
+break pwritev_zero A
+aio_write -P 3 0x30000 0x1000
+wait_break A
+
+break write_aio B
+aio_write -P 4 0x40000 0x1000
+wait_break B
+resume B
+
+resume A
+aio_flush
+EOF
+
+    # Verify that new cluster in the preallocated area triggers
+    #  neither new preallocation nor COW read
+    #
+    # TODO: this test will not fail but hang. Better ideas?
+    #       wait and kill by timeout?
+
+cat <<EOF
+break pwritev_zero A
+break cow_read B
+writev -P 5 0x51000 0x1000
+EOF
+}
+
+io_commands | do_io
+
+echo
+echo "== Verify image content =="
+
+function verify_io()
+{
+cat <<EOF
+read -P 0 0 0x10000
+read -P 1 0x10000 0x8000
+read -P 0 0x18000 0xc000
+read -P 2 0x24000 0x8000
+read -P 0 0x2c000 0x4000
+
+read -P 3 0x30000 0x1000
+read -P 0 0x31000 0xf000
+read -P 4 0x40000 0x1000
+read -P 0 0x41000 0xf000
+
+read -P 0 0x50000 0x1000
+read -P 5 0x51000 0x1000
+read -P 0 0x52000 0xe000
+EOF
+}
+
+verify_io | do_io $TEST_IMG
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/198.out b/tests/qemu-iotests/198.out
new file mode 100644
index 0000000..b4eb27f
--- /dev/null
+++ b/tests/qemu-iotests/198.out
@@ -0,0 +1,50 @@
+QA output created by 198
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
+
+== Test discarded cluster reuse ==
+wrote 65536/65536 bytes at offset XXX
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 65536/65536 bytes at offset XXX
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset XXX
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Test preallocation ==
+blkdebug: Suspended request 'A'
+blkdebug: Suspended request 'B'
+blkdebug: Resuming request 'B'
+blkdebug: Resuming request 'A'
+wrote 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== Verify image content ==
+read 65536/65536 bytes at offset XXX
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset XXX
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 49152/49152 bytes at offset XXX
+48 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset XXX
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16384/16384 bytes at offset XXX
+16 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 61440/61440 bytes at offset XXX
+60 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 61440/61440 bytes at offset XXX
+60 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset XXX
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 57344/57344 bytes at offset XXX
+56 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 24e5ad1..6263608 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -194,3 +194,4 @@
 194 rw auto migration quick
 195 rw auto quick
 197 rw auto quick
+198 rw auto quick
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v5 15/15] iotest 134: test cluster-misaligned encrypted write
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (13 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 14/15] iotest 198: test BDRV_REQ_ALLOCATE Anton Nefedov
@ 2017-11-01 15:44 ` Anton Nefedov
  2017-12-04 22:32 ` [Qemu-devel] [Qemu-block] [PATCH v5 00/15] qcow2: space preallocation and COW improvements John Snow
  15 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-11-01 15:44 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

COW (even empty/zero) areas require encryption too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 tests/qemu-iotests/134     |  9 +++++++++
 tests/qemu-iotests/134.out | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 9914415..6083ae4 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -59,6 +59,15 @@ echo "== reading whole image =="
 $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
 echo
+echo "== rewriting cluster part =="
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== verify pattern =="
+$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
 echo "== rewriting whole image =="
 $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index 972be49..09d46f6 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -5,6 +5,16 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
 read 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+== rewriting cluster part ==
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 == rewriting whole image ==
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 00/15] qcow2: space preallocation and COW improvements
  2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
                   ` (14 preceding siblings ...)
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 15/15] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
@ 2017-12-04 22:32 ` John Snow
  2017-12-05 17:28   ` Anton Nefedov
  15 siblings, 1 reply; 24+ messages in thread
From: John Snow @ 2017-12-04 22:32 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: kwolf, den, qemu-block, mreitz

Friendly ping; this has gone over 30 days without a reply on the list. I
recommend rebasing and trying again after 2.11 hits.

Thanks,
--John

On 11/01/2017 11:43 AM, Anton Nefedov wrote:
> v5: rebased, patches slightly reordered;
>     cluster allocation (to avoid writing zero-buffers)
>     now goes before an optional ahead preallocation
> 
> v4: http://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg00109.html
> 
> This pull request is to improve a few performance problems of qcow2 format:
> 
>   1. non cluster-aligned write requests (to unallocated clusters) explicitly
>      pad data with zeroes if there is no backing data.
>      Resulting increase in ops number and potential cluster fragmentation
>      (on the host file) is already solved by:
>        ee22a9d qcow2: Merge the writing of the COW regions with the guest data
>      However, in case of zero COW regions, that can be avoided at all
>      but the whole clusters are preallocated and zeroed in a single
>      efficient write_zeroes() operation
> 
>   2. moreover, efficient write_zeroes() operation can be used to preallocate
>      space megabytes (*configurable number) ahead which gives noticeable
>      improvement on some storage types (e.g. distributed storage)
>      where the space allocation operation might be expensive)
> 
>   3. this will also allow to enable simultaneous writes to the same unallocated
>      cluster after the space has been allocated & zeroed but before
>      the first data is written and the cluster is linked to L2.
>      (Not included in this patchset).
> 
> Efficient write_zeroes usually implies that the blocks are not actually
> written to but only reserved and marked as zeroed by the storage.
> In this patchset, file-posix driver is marked as supporting this operation
> if it supports (/configured to support) fallocate() operation.
> 
> Existing bdrv_write_zeroes() falls back to writing zero buffers if
> write_zeroes is not supported by the driver.
> A new flag (BDRV_REQ_ALLOCATE) is introduced to avoid that but return ENOTSUP.
> Such allocate requests are also implemented to possibly overlap with the
> other requests. No wait is performed but an error returned in such case as well.
> So the operation should be considered advisory and a fallback scenario still
> handled by the caller (in this case, qcow2 driver).
> 
> Anton Nefedov (12):
>   mirror: inherit supported write/zero flags
>   blkverify: set supported write/zero flags
>   block: introduce BDRV_REQ_ALLOCATE flag
>   block: treat BDRV_REQ_ALLOCATE as serialising
>   file-posix: support BDRV_REQ_ALLOCATE
>   block: support BDRV_REQ_ALLOCATE in passthrough drivers
>   qcow2: move is_zero() up
>   qcow2: skip writing zero buffers to empty COW areas
>   qcow2: set inactive flag
>   qcow2: do not zero out clusters if already preallocated
>   iotest 198: test BDRV_REQ_ALLOCATE
>   iotest 134: test cluster-misaligned encrypted write
> 
> Denis V. Lunev (2):
>   qcow2: preallocation at image expand
>   qcow2: truncate preallocated space
> 
> Pavel Butsykin (1):
>   qcow2: check space leak at the end of the image
> 
>  block/qcow2.h                      |  18 ++++
>  include/block/block.h              |   6 +-
>  include/block/block_int.h          |   2 +-
>  block/blkdebug.c                   |   3 +-
>  block/blkverify.c                  |   9 ++
>  block/file-posix.c                 |   8 ++
>  block/io.c                         |  47 +++++++--
>  block/mirror.c                     |   5 +
>  block/qcow2-cluster.c              |  14 ++-
>  block/qcow2-refcount.c             |   7 ++
>  block/qcow2.c                      | 202 +++++++++++++++++++++++++++++++++----
>  block/raw-format.c                 |   3 +-
>  block/trace-events                 |   1 +
>  qemu-options.hx                    |   4 +
>  tests/qemu-iotests/026.out         |  94 +++++++++++++----
>  tests/qemu-iotests/026.out.nocache |  94 +++++++++++++----
>  tests/qemu-iotests/029.out         |   5 +-
>  tests/qemu-iotests/060             |   2 +-
>  tests/qemu-iotests/060.out         |  13 ++-
>  tests/qemu-iotests/061.out         |   5 +-
>  tests/qemu-iotests/066             |   2 +-
>  tests/qemu-iotests/066.out         |   9 +-
>  tests/qemu-iotests/098.out         |   7 +-
>  tests/qemu-iotests/108.out         |   5 +-
>  tests/qemu-iotests/112.out         |   5 +-
>  tests/qemu-iotests/134             |   9 ++
>  tests/qemu-iotests/134.out         |  10 ++
>  tests/qemu-iotests/198             | 146 +++++++++++++++++++++++++++
>  tests/qemu-iotests/198.out         |  50 +++++++++
>  tests/qemu-iotests/group           |   1 +
>  30 files changed, 693 insertions(+), 93 deletions(-)
>  create mode 100755 tests/qemu-iotests/198
>  create mode 100644 tests/qemu-iotests/198.out
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v5 00/15] qcow2: space preallocation and COW improvements
  2017-12-04 22:32 ` [Qemu-devel] [Qemu-block] [PATCH v5 00/15] qcow2: space preallocation and COW improvements John Snow
@ 2017-12-05 17:28   ` Anton Nefedov
  0 siblings, 0 replies; 24+ messages in thread
From: Anton Nefedov @ 2017-12-05 17:28 UTC (permalink / raw)
  To: John Snow, qemu-devel; +Cc: kwolf, den, qemu-block, mreitz

On 5/12/2017 1:32 AM, John Snow wrote:
> Friendly ping; this has gone over 30 days without a reply on the list. I
> recommend rebasing and trying again after 2.11 hits.
> 
> Thanks,
> --John
> 

Thank you; I also though of maybe splitting this series in two i.e.
patches ~1-8 and then the actual preallocation ahead.
Will do.

/Anton

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/15] mirror: inherit supported write/zero flags
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 01/15] mirror: inherit supported write/zero flags Anton Nefedov
@ 2018-01-11 14:14   ` Alberto Garcia
  0 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2018-01-11 14:14 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Wed 01 Nov 2017 04:43:54 PM CET, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v5 02/15] blkverify: set supported write/zero flags
  2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 02/15] blkverify: set " Anton Nefedov
@ 2018-01-11 14:17   ` Alberto Garcia
  0 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2018-01-11 14:17 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Wed 01 Nov 2017 04:43:55 PM CET, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/15] qcow2: move is_zero() up
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 07/15] qcow2: move is_zero() up Anton Nefedov
@ 2018-01-15 15:14   ` Alberto Garcia
  0 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2018-01-15 15:14 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Wed 01 Nov 2017 04:44:00 PM CET, Anton Nefedov wrote:
> To be used in the following commit without a forward declaration.
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas
  2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2018-01-15 15:31   ` Alberto Garcia
  2018-01-15 18:21     ` Anton Nefedov
  0 siblings, 1 reply; 24+ messages in thread
From: Alberto Garcia @ 2018-01-15 15:31 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Wed 01 Nov 2017 04:44:01 PM CET, Anton Nefedov wrote:
> If COW areas of the newly allocated clusters are zeroes on the backing image,
> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
> cluster instead of writing explicit zero buffers later in perform_cow().
>
> iotest 060:
> write to the discarded cluster does not trigger COW anymore.
> so, break on write_aio event instead, will work for the test
> (but write won't fail anymore, so update reference output)
>
> iotest 066:
> cluster-alignment areas that were not really COWed are now detected
> as zeroes, hence the initial write has to be exactly the same size for
> the maps to match
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  block/qcow2.h              |  6 +++++
>  block/qcow2-cluster.c      |  3 ++-
>  block/qcow2.c              | 67 ++++++++++++++++++++++++++++++++++++++++++++--
>  block/trace-events         |  1 +
>  tests/qemu-iotests/060     |  2 +-
>  tests/qemu-iotests/060.out |  3 ++-
>  tests/qemu-iotests/066     |  2 +-
>  tests/qemu-iotests/066.out |  4 +--
>  8 files changed, 80 insertions(+), 8 deletions(-)
>
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 782a206..e312e48 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -377,6 +377,12 @@ typedef struct QCowL2Meta
>      Qcow2COWRegion cow_end;
>  
>      /**
> +     * Indicates that both COW areas are empty (nb_bytes == 0)
> +     * or filled with zeroes and do not require any more copying
> +     */
> +    bool zero_cow;

I think the terminology elsewhere is "COW regions".

Also, in this patch that field doesn't really seem to track the case
where both regions have nb_bytes == 0, or does it? It seems to be
checked separately in all cases.

Example:

> -    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
> +    if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->zero_cow) {
>          return 0;
>      }

Here, 'if (m->zero_cow)' would suffice.

> +        /* If both COW regions are zeroes already, skip this too */
> +        if (m->zero_cow) {
> +            continue;
> +        }

Same as above

> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
> +{
> +    if (bs->encrypted) {
> +        return false;
> +    }
> +
> +    if (m->cow_start.nb_bytes != 0 &&
> +        !is_zero(bs, m->offset + m->cow_start.offset, m->cow_start.nb_bytes))
> +    {
> +        return false;
> +    }
> +
> +    if (m->cow_end.nb_bytes != 0 &&
> +        !is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes))
> +    {
> +        return false;
> +    }

Why do you need to check that nb_bytes != 0? Doesn't is_zero() do that
already?

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas
  2018-01-15 15:31   ` Alberto Garcia
@ 2018-01-15 18:21     ` Anton Nefedov
  2018-01-15 20:12       ` Alberto Garcia
  0 siblings, 1 reply; 24+ messages in thread
From: Anton Nefedov @ 2018-01-15 18:21 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: qemu-block, kwolf, mreitz, eblake, den, anton.nefedov



On 15/1/2018 6:31 PM, Alberto Garcia wrote:
> On Wed 01 Nov 2017 04:44:01 PM CET, Anton Nefedov wrote:
>> If COW areas of the newly allocated clusters are zeroes on the backing image,
>> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
>> cluster instead of writing explicit zero buffers later in perform_cow().
>>
>> iotest 060:
>> write to the discarded cluster does not trigger COW anymore.
>> so, break on write_aio event instead, will work for the test
>> (but write won't fail anymore, so update reference output)
>>
>> iotest 066:
>> cluster-alignment areas that were not really COWed are now detected
>> as zeroes, hence the initial write has to be exactly the same size for
>> the maps to match
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> ---
>>   block/qcow2.h              |  6 +++++
>>   block/qcow2-cluster.c      |  3 ++-
>>   block/qcow2.c              | 67 ++++++++++++++++++++++++++++++++++++++++++++--
>>   block/trace-events         |  1 +
>>   tests/qemu-iotests/060     |  2 +-
>>   tests/qemu-iotests/060.out |  3 ++-
>>   tests/qemu-iotests/066     |  2 +-
>>   tests/qemu-iotests/066.out |  4 +--
>>   8 files changed, 80 insertions(+), 8 deletions(-)
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 782a206..e312e48 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -377,6 +377,12 @@ typedef struct QCowL2Meta
>>       Qcow2COWRegion cow_end;
>>   
>>       /**
>> +     * Indicates that both COW areas are empty (nb_bytes == 0)
>> +     * or filled with zeroes and do not require any more copying
>> +     */
>> +    bool zero_cow;
> 
> I think the terminology elsewhere is "COW regions".
> 

Hi Berto! thanks much for checking this series

Will fix 'areas' to 'regions'

> Also, in this patch that field doesn't really seem to track the case
> where both regions have nb_bytes == 0, or does it? It seems to be
> checked separately in all cases.
> 
> Example:
> 
>> -    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>> +    if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->zero_cow) {
>>           return 0;
>>       }
> 
> Here, 'if (m->zero_cow)' would suffice.
> 

The thing is, zero_cow is not assigned on some code paths,
e.g. !(bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE)
or when bs->encrypted.

but now thinking about this again; probably it should be - that will be
least confusing. I'll fix


>> +        /* If both COW regions are zeroes already, skip this too */
>> +        if (m->zero_cow) {
>> +            continue;
>> +        }
> 
> Same as above
> 

Yep.

>> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
>> +{
>> +    if (bs->encrypted) {
>> +        return false;
>> +    }
>> +
>> +    if (m->cow_start.nb_bytes != 0 &&
>> +        !is_zero(bs, m->offset + m->cow_start.offset, m->cow_start.nb_bytes))
>> +    {
>> +        return false;
>> +    }
>> +
>> +    if (m->cow_end.nb_bytes != 0 &&
>> +        !is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes))
>> +    {
>> +        return false;
>> +    }
> 
> Why do you need to check that nb_bytes != 0? Doesn't is_zero() do that
> already?
> 

I really don't; I think previously is_zero() didn't have that check
and I didn't notice it introduced during a rebase.

Will fix

> Berto
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas
  2018-01-15 18:21     ` Anton Nefedov
@ 2018-01-15 20:12       ` Alberto Garcia
  0 siblings, 0 replies; 24+ messages in thread
From: Alberto Garcia @ 2018-01-15 20:12 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Mon 15 Jan 2018 07:21:01 PM CET, Anton Nefedov wrote:
>>>       /**
>>> +     * Indicates that both COW areas are empty (nb_bytes == 0)
>>> +     * or filled with zeroes and do not require any more copying
>>> +     */
>>> +    bool zero_cow;
[...]
>>> -    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>>> +    if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->zero_cow) {
>>>           return 0;
>>>       }
>> 
>> Here, 'if (m->zero_cow)' would suffice.
>
> The thing is, zero_cow is not assigned on some code paths,
> e.g. !(bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE)
> or when bs->encrypted.

Right, that's why I said that the comment on the zero_cow field is
incorrect, zero_cow does not indicate when nb_bytes == 0.

> but now thinking about this again; probably it should be - that will
> be least confusing. I'll fix

Choose the simplest solution, as long as the documentation matches the
semantics.

Berto

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-01-15 20:12 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-01 15:43 [Qemu-devel] [PATCH v5 00/15] qcow2: space preallocation and COW improvements Anton Nefedov
2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 01/15] mirror: inherit supported write/zero flags Anton Nefedov
2018-01-11 14:14   ` Alberto Garcia
2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 02/15] blkverify: set " Anton Nefedov
2018-01-11 14:17   ` Alberto Garcia
2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 03/15] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 04/15] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 05/15] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
2017-11-01 15:43 ` [Qemu-devel] [PATCH v5 06/15] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 07/15] qcow2: move is_zero() up Anton Nefedov
2018-01-15 15:14   ` Alberto Garcia
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 08/15] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
2018-01-15 15:31   ` Alberto Garcia
2018-01-15 18:21     ` Anton Nefedov
2018-01-15 20:12       ` Alberto Garcia
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 09/15] qcow2: preallocation at image expand Anton Nefedov
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 10/15] qcow2: set inactive flag Anton Nefedov
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 11/15] qcow2: truncate preallocated space Anton Nefedov
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 12/15] qcow2: check space leak at the end of the image Anton Nefedov
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 13/15] qcow2: do not zero out clusters if already preallocated Anton Nefedov
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 14/15] iotest 198: test BDRV_REQ_ALLOCATE Anton Nefedov
2017-11-01 15:44 ` [Qemu-devel] [PATCH v5 15/15] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
2017-12-04 22:32 ` [Qemu-devel] [Qemu-block] [PATCH v5 00/15] qcow2: space preallocation and COW improvements John Snow
2017-12-05 17:28   ` Anton Nefedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.