All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation
@ 2018-01-18 17:48 Anton Nefedov
  2018-01-18 17:48 ` [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags Anton Nefedov
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

(used to be 'qcow2: preallocation and COW improvements')

v7: patch 8/9:
    - is_zero_cow() function reworked
    - blkdbg event added
    - write-zeroes errors handled
    - iotest 60 fixed properly

v6: http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg03463.html

Anton Nefedov (9):
  mirror: inherit supported write/zero flags
  blkverify: set supported write/zero flags
  block: introduce BDRV_REQ_ALLOCATE flag
  block: treat BDRV_REQ_ALLOCATE as serialising
  file-posix: support BDRV_REQ_ALLOCATE
  block: support BDRV_REQ_ALLOCATE in passthrough drivers
  qcow2: move is_zero() up
  qcow2: skip writing zero buffers to empty COW areas
  iotest 134: test cluster-misaligned encrypted write

 qapi/block-core.json       |   4 +-
 block/qcow2.h              |   6 +++
 include/block/block.h      |   6 ++-
 include/block/block_int.h  |   2 +-
 block/blkdebug.c           |   3 +-
 block/blkverify.c          |   9 ++++
 block/file-posix.c         |   8 ++++
 block/io.c                 |  47 ++++++++++++++++-----
 block/mirror.c             |   5 +++
 block/qcow2-cluster.c      |   2 +-
 block/qcow2.c              | 101 ++++++++++++++++++++++++++++++++++++---------
 block/raw-format.c         |   3 +-
 block/trace-events         |   1 +
 tests/qemu-iotests/060     |  26 +++++++-----
 tests/qemu-iotests/060.out |   5 ++-
 tests/qemu-iotests/066     |   2 +-
 tests/qemu-iotests/066.out |   4 +-
 tests/qemu-iotests/134     |   9 ++++
 tests/qemu-iotests/134.out |  10 +++++
 19 files changed, 202 insertions(+), 51 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
@ 2018-01-18 17:48 ` Anton Nefedov
  2018-01-29 19:21   ` Max Reitz
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 2/9] blkverify: set " Anton Nefedov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/mirror.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index c9badc1..d18ec65 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1064,6 +1064,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
     bdrv_refresh_filename(bs->backing->bs);
     pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
             bs->backing->bs->filename);
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->backing->bs->supported_write_flags;
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->backing->bs->supported_zero_flags;
 }
 
 static void bdrv_mirror_top_close(BlockDriverState *bs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 2/9] blkverify: set supported write/zero flags
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
  2018-01-18 17:48 ` [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 19:23   ` Max Reitz
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/blkverify.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 06369f9..9ba65d0 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -140,6 +140,15 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags &
+        s->test_file->bs->supported_write_flags;
+
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags &
+        s->test_file->bs->supported_zero_flags;
+
     ret = 0;
 fail:
     qemu_opts_del(opts);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
  2018-01-18 17:48 ` [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags Anton Nefedov
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 2/9] blkverify: set " Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 19:37   ` Max Reitz
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

The flag is supposed to indicate that the region of the disk image has
to be sufficiently allocated so it reads as zeroes.

The call with the flag set must return -ENOTSUP if allocation cannot
be done efficiently.
This has to be made sure of by both
  - the drivers that support the flag
  - and the common block layer (so it will not fall back to any slowpath
    (like writing zero buffers) in case the driver does not support
    the flag).

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 include/block/block.h     |  6 +++++-
 include/block/block_int.h |  2 +-
 block/io.c                | 20 +++++++++++++++++---
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 9b12774..3e31b89 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -65,9 +65,13 @@ typedef enum {
     BDRV_REQ_NO_SERIALISING     = 0x8,
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
+    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
+     * efficiently allocate the space so it reads as zeroes, or return an error.
+     */
+    BDRV_REQ_ALLOCATE           = 0x40,
 
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x3f,
+    BDRV_REQ_MASK               = 0x7f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 29cafa4..b141710 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -632,7 +632,7 @@ struct BlockDriverState {
     /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
     unsigned int supported_write_flags;
     /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
-     * BDRV_REQ_MAY_UNMAP) */
+     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
     unsigned int supported_zero_flags;
 
     /* the following member gives a name to every node on the bs graph. */
diff --git a/block/io.c b/block/io.c
index 7ea4023..cf2f84c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1424,7 +1424,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
             assert(!bs->supported_zero_flags);
         }
 
-        if (ret == -ENOTSUP) {
+        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
             /* Fall back to bounce buffer if write zeroes is unsupported */
             BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
 
@@ -1514,8 +1514,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
     if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
-        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
-        qemu_iovec_is_zero(qiov)) {
+        !(flags & BDRV_REQ_ZERO_WRITE) && !(flags & BDRV_REQ_ALLOCATE) &&
+        drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {
         flags |= BDRV_REQ_ZERO_WRITE;
         if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
             flags |= BDRV_REQ_MAY_UNMAP;
@@ -1593,6 +1593,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
     assert(flags & BDRV_REQ_ZERO_WRITE);
     if (head_padding_bytes || tail_padding_bytes) {
+        if (flags & BDRV_REQ_ALLOCATE) {
+            return -ENOTSUP;
+        }
         buf = qemu_blockalign(bs, align);
         iov = (struct iovec) {
             .iov_base   = buf,
@@ -1693,6 +1696,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         return ret;
     }
 
+    /* allocation request with qiov provided doesn't make much sense */
+    assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
+
     bdrv_inc_in_flight(bs);
     /*
      * Align write if necessary by performing a read-modify-write cycle.
@@ -1822,6 +1828,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
 {
     trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
 
+    assert(!((flags & BDRV_REQ_MAY_UNMAP) && (flags & BDRV_REQ_ALLOCATE)));
+
+    if ((flags & BDRV_REQ_ALLOCATE) &&
+        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
+    {
+        return -ENOTSUP;
+    }
+
     if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
         flags &= ~BDRV_REQ_MAY_UNMAP;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (2 preceding siblings ...)
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 19:48   ` Max Reitz
  2018-01-31 15:11   ` Alberto Garcia
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 5/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

The idea is that ALLOCATE requests may overlap with other requests.
Reuse the existing block layer infrastructure for serialising requests.
Use the following approach:
  - mark ALLOCATE serialising, so subsequent requests to the area wait
  - ALLOCATE request itself must never wait if another request is in flight
    already. Return EAGAIN, let the caller reconsider.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/io.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index cf2f84c..4b0d34f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -605,7 +605,8 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
     bdrv_wakeup(bs);
 }
 
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
+                                                   bool nowait)
 {
     BlockDriverState *bs = self->bs;
     BdrvTrackedRequest *req;
@@ -636,11 +637,14 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
                  * will wait for us as soon as it wakes up, then just go on
                  * (instead of producing a deadlock in the former case). */
                 if (!req->waiting_for) {
+                    waited = true;
+                    if (nowait) {
+                        break;
+                    }
                     self->waiting_for = req;
                     qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
                     self->waiting_for = NULL;
                     retry = true;
-                    waited = true;
                     break;
                 }
             }
@@ -1206,7 +1210,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
     }
 
     if (!(flags & BDRV_REQ_NO_SERIALISING)) {
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
     }
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1504,7 +1508,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
-    waited = wait_serialising_requests(req);
+    waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
+    if (waited && flags & BDRV_REQ_ALLOCATE) {
+        return -EAGAIN;
+    }
     assert(!waited || !req->serialising);
     assert(req->overlap_offset <= offset);
     assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
@@ -1608,7 +1615,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
         /* RMW the unaligned part before head. */
         mark_request_serialising(req, align);
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
         bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_HEAD);
         ret = bdrv_aligned_preadv(child, req, offset & ~(align - 1), align,
                                   align, &local_qiov, 0);
@@ -1628,6 +1635,10 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
         bytes -= zero_bytes;
     }
 
+    if (flags & BDRV_REQ_ALLOCATE) {
+        mark_request_serialising(req, align);
+    }
+
     assert(!bytes || (offset & (align - 1)) == 0);
     if (bytes >= align) {
         /* Write the aligned part in the middle. */
@@ -1646,7 +1657,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
         assert(align == tail_padding_bytes + bytes);
         /* RMW the unaligned part after tail. */
         mark_request_serialising(req, align);
-        wait_serialising_requests(req);
+        wait_serialising_requests(req, false);
         bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_TAIL);
         ret = bdrv_aligned_preadv(child, req, offset, align,
                                   align, &local_qiov, 0);
@@ -1717,7 +1728,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         struct iovec head_iov;
 
         mark_request_serialising(&req, align);
-        wait_serialising_requests(&req);
+        wait_serialising_requests(&req, false);
 
         head_buf = qemu_blockalign(bs, align);
         head_iov = (struct iovec) {
@@ -1758,7 +1769,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
         bool waited;
 
         mark_request_serialising(&req, align);
-        waited = wait_serialising_requests(&req);
+        waited = wait_serialising_requests(&req, false);
         assert(!waited || !use_local_qiov);
 
         tail_buf = qemu_blockalign(bs, align);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 5/9] file-posix: support BDRV_REQ_ALLOCATE
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (3 preceding siblings ...)
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 19:56   ` Max Reitz
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Current write_zeroes implementation is good enough to satisfy this flag too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/file-posix.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 36ee89e..c36e156 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -558,7 +558,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
     if (S_ISREG(st.st_mode)) {
         s->discard_zeroes = true;
+#ifdef CONFIG_FALLOCATE
         s->has_fallocate = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
+#endif
     }
     if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -593,6 +596,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
         s->is_xfs = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
     }
 #endif
 
@@ -1413,6 +1417,10 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
         }
         s->has_fallocate = false;
     }
+
+    if (!s->has_fallocate) {
+        aiocb->bs->supported_zero_flags &= ~BDRV_REQ_ALLOCATE;
+    }
 #endif
 
     return -ENOTSUP;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (4 preceding siblings ...)
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 5/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 19:58   ` Max Reitz
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 7/9] qcow2: move is_zero() up Anton Nefedov
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Support the flag if the underlying BDS supports it

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/blkdebug.c   | 3 ++-
 block/blkverify.c  | 2 +-
 block/mirror.c     | 2 +-
 block/raw-format.c | 3 ++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index e216699..7d5773d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -400,7 +400,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
 
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
     ret = -EINVAL;
 
diff --git a/block/blkverify.c b/block/blkverify.c
index 9ba65d0..b249636 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -145,7 +145,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         s->test_file->bs->supported_write_flags;
 
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags &
         s->test_file->bs->supported_zero_flags;
 
diff --git a/block/mirror.c b/block/mirror.c
index d18ec65..eb41deb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1067,7 +1067,7 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->backing->bs->supported_write_flags;
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->backing->bs->supported_zero_flags;
 }
 
diff --git a/block/raw-format.c b/block/raw-format.c
index ab552c0..b1deb93 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -416,7 +416,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
     bs->sg = bs->file->bs->sg;
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
 
     if (bs->probed && !bdrv_is_read_only(bs)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 7/9] qcow2: move is_zero() up
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (5 preceding siblings ...)
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 19:59   ` Max Reitz
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
  8 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

To be used in the following commit without a forward declaration.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4348b2c..2ed21ff 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1858,6 +1858,23 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
     return false;
 }
 
+static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+    int64_t nr;
+    int res;
+
+    /* Clamp to image length, before checking status of underlying sectors */
+    if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
+        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - offset;
+    }
+
+    if (!bytes) {
+        return true;
+    }
+    res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
+    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -2975,24 +2992,6 @@ finish:
     return ret;
 }
 
-
-static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
-{
-    int64_t nr;
-    int res;
-
-    /* Clamp to image length, before checking status of underlying sectors */
-    if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
-        bytes = bs->total_sectors * BDRV_SECTOR_SIZE - offset;
-    }
-
-    if (!bytes) {
-        return true;
-    }
-    res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
-}
-
 static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
     int64_t offset, int bytes, BdrvRequestFlags flags)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (6 preceding siblings ...)
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 7/9] qcow2: move is_zero() up Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 20:28   ` Max Reitz
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
  8 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

If COW areas of the newly allocated clusters are zeroes on the backing image,
efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
cluster instead of writing explicit zero buffers later in perform_cow().

iotest 060:
write to the discarded cluster does not trigger COW anymore.
Use a backing image instead.

iotest 066:
cluster-alignment areas that were not really COWed are now detected
as zeroes, hence the initial write has to be exactly the same size for
the maps to match

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 qapi/block-core.json       |  4 ++-
 block/qcow2.h              |  6 +++++
 block/qcow2-cluster.c      |  2 +-
 block/qcow2.c              | 66 ++++++++++++++++++++++++++++++++++++++++++++--
 block/trace-events         |  1 +
 tests/qemu-iotests/060     | 26 +++++++++++-------
 tests/qemu-iotests/060.out |  5 +++-
 tests/qemu-iotests/066     |  2 +-
 tests/qemu-iotests/066.out |  4 +--
 9 files changed, 98 insertions(+), 18 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index e94a688..1579a77 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2546,6 +2546,8 @@
 #
 # @cor_write: a write due to copy-on-read (since 2.11)
 #
+# @cluster_alloc_space: an allocation of a cluster file space (since 2.12)
+#
 # Since: 2.9
 ##
 { 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG',
@@ -2564,7 +2566,7 @@
             'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
             'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
             'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-            'cor_write'] }
+            'cor_write', 'cluster_alloc_space'] }
 
 ##
 # @BlkdebugInjectErrorOptions:
diff --git a/block/qcow2.h b/block/qcow2.h
index 46c8cf4..e6e3a22 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -377,6 +377,12 @@ typedef struct QCowL2Meta
     Qcow2COWRegion cow_end;
 
     /**
+     * Indicates that COW regions are already handled and do not require
+     * any more processing.
+     */
+    bool skip_cow;
+
+    /**
      * The I/O vector with the data from the actual guest write request.
      * If non-NULL, this is meant to be merged together with the data
      * from @cow_start and @cow_end into one single write operation.
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a3fec27..511ceb8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -791,7 +791,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->offset + start->nb_bytes <= end->offset);
     assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
-    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
+    if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->skip_cow) {
         return 0;
     }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 2ed21ff..087cb26 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1833,6 +1833,11 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
             continue;
         }
 
+        /* If COW regions are handled already, skip this too */
+        if (m->skip_cow) {
+            continue;
+        }
+
         /* The data (middle) region must be immediately after the
          * start region */
         if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
@@ -1875,6 +1880,52 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
     return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
 }
 
+static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
+{
+    return is_zero(bs, m->offset + m->cow_start.offset,
+                   m->cow_start.nb_bytes) &&
+           is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes);
+}
+
+static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCowL2Meta *m;
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        int ret;
+
+        if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
+            continue;
+        }
+
+        if (bs->encrypted) {
+            continue;
+        }
+
+        if (!is_zero_cow(bs, m)) {
+            continue;
+        }
+
+        BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
+        /* instead of writing zero COW buffers,
+           efficiently zero out the whole clusters */
+        ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
+                                    m->nb_clusters * s->cluster_size,
+                                    BDRV_REQ_ALLOCATE);
+        if (ret < 0) {
+            if (ret != -ENOTSUP && ret != -EAGAIN) {
+                return ret;
+            }
+            continue;
+        }
+
+        trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
+        m->skip_cow = true;
+    }
+    return 0;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -1957,24 +2008,35 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
+        qemu_co_mutex_unlock(&s->lock);
+
+        if (bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE) {
+            ret = handle_alloc_space(bs, l2meta);
+            if (ret < 0) {
+                qemu_co_mutex_lock(&s->lock);
+                goto fail;
+            }
+        }
+
         /* If we need to do COW, check if it's possible to merge the
          * writing of the guest data together with that of the COW regions.
          * If it's not possible (or not necessary) then write the
          * guest data now. */
         if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
-            qemu_co_mutex_unlock(&s->lock);
             BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
             trace_qcow2_writev_data(qemu_coroutine_self(),
                                     cluster_offset + offset_in_cluster);
             ret = bdrv_co_pwritev(bs->file,
                                   cluster_offset + offset_in_cluster,
                                   cur_bytes, &hd_qiov, 0);
-            qemu_co_mutex_lock(&s->lock);
             if (ret < 0) {
+                qemu_co_mutex_lock(&s->lock);
                 goto fail;
             }
         }
 
+        qemu_co_mutex_lock(&s->lock);
+
         while (l2meta != NULL) {
             QCowL2Meta *next;
 
diff --git a/block/trace-events b/block/trace-events
index 11c8d5f..c9fa596 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -61,6 +61,7 @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
 qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
 qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
 qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
+qcow2_skip_cow(void* co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"
 
 # block/qcow2-cluster.c
 qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 14797dd..92beff4 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -143,27 +143,33 @@ $QEMU_IO -c "$OPEN_RO" -c "read -P 1 0 512" | _filter_qemu_io
 echo
 echo "=== Testing overlap while COW is in flight ==="
 echo
+BACKING_IMG=$TEST_IMG.base
+TEST_IMG=$BACKING_IMG _make_test_img 1G
+
+$QEMU_IO -c 'write 64k 64k' "$BACKING_IMG" | _filter_qemu_io
+
 # compat=0.10 is required in order to make the following discard actually
-# unallocate the sector rather than make it a zero sector - we want COW, after
-# all.
-IMGOPTS='compat=0.10' _make_test_img 1G
+# unallocate the sector rather than make it a zero sector as we would like
+# to reuse it for another guest offset
+IMGOPTS='compat=0.10' _make_test_img -b "$BACKING_IMG" 1G
 # Write two clusters, the second one enforces creation of an L2 table after
 # the first data cluster.
 $QEMU_IO -c 'write 0k 64k' -c 'write 512M 64k' "$TEST_IMG" | _filter_qemu_io
-# Discard the first cluster. This cluster will soon enough be reallocated and
-# used for COW.
+# Discard the first cluster. This cluster will soon enough be reallocated
 $QEMU_IO -c 'discard 0k 64k' "$TEST_IMG" | _filter_qemu_io
 # Now, corrupt the image by marking the second L2 table cluster as free.
 poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
-# Start a write operation requiring COW on the image stopping it right before
-# doing the read; then, trigger the corruption prevention by writing anything to
-# any unallocated cluster, leading to an attempt to overwrite the second L2
+# Start a write operation requiring COW on the image;
+# this write will reuse the host offset released by a previous discard.
+# Stop it right before doing the read.
+# Then, trigger the corruption prevention by writing anything to
+# another unallocated cluster, leading to an attempt to overwrite the second L2
 # table. Finally, resume the COW write and see it fail (but not crash).
 echo "open -o file.driver=blkdebug $TEST_IMG
 break cow_read 0
-aio_write 0k 1k
+aio_write 64k 1k
 wait_break 0
-write 64k 64k
+write 128k 64k
 resume 0" | $QEMU_IO | _filter_qemu_io
 
 echo
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index c4cb7c6..15d95d5 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -97,7 +97,10 @@ read 512/512 bytes at offset 0
 
 === Testing overlap while COW is in flight ===
 
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1073741824
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 backing_file=TEST_DIR/t.IMGFMT.base
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 65536/65536 bytes at offset 536870912
diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
index 8638217..3c216a1 100755
--- a/tests/qemu-iotests/066
+++ b/tests/qemu-iotests/066
@@ -71,7 +71,7 @@ echo
 _make_test_img $IMG_SIZE
 
 # Create data clusters (not aligned to an L2 table)
-$QEMU_IO -c 'write -P 42 1M 256k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -P 42 $(((1024 + 32) * 1024)) 192k" "$TEST_IMG" | _filter_qemu_io
 orig_map=$($QEMU_IMG map --output=json "$TEST_IMG")
 
 # Convert the data clusters to preallocated zero clusters
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index 3d9da9b..093431e 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -19,8 +19,8 @@ Offset          Length          Mapped to       File
 === Writing to preallocated zero clusters ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67109376
-wrote 262144/262144 bytes at offset 1048576
-256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 1081344
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 262144/262144 bytes at offset 1048576
 256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 196608/196608 bytes at offset 1081344
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write
  2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (7 preceding siblings ...)
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2018-01-18 17:49 ` Anton Nefedov
  2018-01-29 20:30   ` Max Reitz
  2018-01-30 13:03   ` Alberto Garcia
  8 siblings, 2 replies; 38+ messages in thread
From: Anton Nefedov @ 2018-01-18 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

COW (even empty/zero) areas require encryption too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/134     |  9 +++++++++
 tests/qemu-iotests/134.out | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 9914415..6083ae4 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -59,6 +59,15 @@ echo "== reading whole image =="
 $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
 echo
+echo "== rewriting cluster part =="
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== verify pattern =="
+$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
 echo "== rewriting whole image =="
 $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index 972be49..09d46f6 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -5,6 +5,16 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
 read 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+== rewriting cluster part ==
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 == rewriting whole image ==
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags
  2018-01-18 17:48 ` [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags Anton Nefedov
@ 2018-01-29 19:21   ` Max Reitz
  2018-01-29 19:26     ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2018-01-29 19:21 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]

On 2018-01-18 18:48, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/mirror.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index c9badc1..d18ec65 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1064,6 +1064,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
>      bdrv_refresh_filename(bs->backing->bs);
>      pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>              bs->backing->bs->filename);
> +    bs->supported_write_flags = BDRV_REQ_FUA &
> +        bs->backing->bs->supported_write_flags;
> +    bs->supported_zero_flags =
> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +        bs->backing->bs->supported_zero_flags;
>  }
>  
>  static void bdrv_mirror_top_close(BlockDriverState *bs)

Fundamentally OK, but why is this in *_refresh_filename()?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/9] blkverify: set supported write/zero flags
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 2/9] blkverify: set " Anton Nefedov
@ 2018-01-29 19:23   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-01-29 19:23 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/blkverify.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags
  2018-01-29 19:21   ` Max Reitz
@ 2018-01-29 19:26     ` Eric Blake
  2018-01-30 12:15       ` Anton Nefedov
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2018-01-29 19:26 UTC (permalink / raw)
  To: Max Reitz, Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, den, berto

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]

On 01/29/2018 01:21 PM, Max Reitz wrote:
> On 2018-01-18 18:48, Anton Nefedov wrote:
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>> ---
>>  block/mirror.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index c9badc1..d18ec65 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1064,6 +1064,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
>>      bdrv_refresh_filename(bs->backing->bs);
>>      pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>>              bs->backing->bs->filename);
>> +    bs->supported_write_flags = BDRV_REQ_FUA &
>> +        bs->backing->bs->supported_write_flags;
>> +    bs->supported_zero_flags =
>> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
>> +        bs->backing->bs->supported_zero_flags;
>>  }
>>  
>>  static void bdrv_mirror_top_close(BlockDriverState *bs)
> 
> Fundamentally OK, but why is this in *_refresh_filename()?

Indeed, I missed that (or maybe it got moved during a botched rebase?).
For comparison, blkdebug sets it during blkdebug_open(), and nbd sets it
during nbd_client_init() (called during nbd_open()).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2018-01-29 19:37   ` Max Reitz
  2018-01-30 12:34     ` Anton Nefedov
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2018-01-29 19:37 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 5325 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> The flag is supposed to indicate that the region of the disk image has
> to be sufficiently allocated so it reads as zeroes.
> 
> The call with the flag set must return -ENOTSUP if allocation cannot
> be done efficiently.
> This has to be made sure of by both
>   - the drivers that support the flag
>   - and the common block layer (so it will not fall back to any slowpath
>     (like writing zero buffers) in case the driver does not support
>     the flag).
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  include/block/block.h     |  6 +++++-
>  include/block/block_int.h |  2 +-
>  block/io.c                | 20 +++++++++++++++++---
>  3 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 9b12774..3e31b89 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -65,9 +65,13 @@ typedef enum {
>      BDRV_REQ_NO_SERIALISING     = 0x8,
>      BDRV_REQ_FUA                = 0x10,
>      BDRV_REQ_WRITE_COMPRESSED   = 0x20,
> +    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
> +     * efficiently allocate the space so it reads as zeroes, or return an error.

What happens if you specify this for a normal write operation that does
not write zeroes?

(I suppose the answer is "don't do that", but that would need to be
documented more clearly here.)

> +     */
> +    BDRV_REQ_ALLOCATE           = 0x40,
>  
>      /* Mask of valid flags */
> -    BDRV_REQ_MASK               = 0x3f,
> +    BDRV_REQ_MASK               = 0x7f,
>  } BdrvRequestFlags;
>  
>  typedef struct BlockSizes {
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 29cafa4..b141710 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -632,7 +632,7 @@ struct BlockDriverState {
>      /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
>      unsigned int supported_write_flags;
>      /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
> -     * BDRV_REQ_MAY_UNMAP) */
> +     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
>      unsigned int supported_zero_flags;
>  
>      /* the following member gives a name to every node on the bs graph. */
> diff --git a/block/io.c b/block/io.c
> index 7ea4023..cf2f84c 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1424,7 +1424,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>              assert(!bs->supported_zero_flags);
>          }
>  
> -        if (ret == -ENOTSUP) {
> +        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
>              /* Fall back to bounce buffer if write zeroes is unsupported */
>              BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
>  
> @@ -1514,8 +1514,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>      ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
>  
>      if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
> -        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
> -        qemu_iovec_is_zero(qiov)) {
> +        !(flags & BDRV_REQ_ZERO_WRITE) && !(flags & BDRV_REQ_ALLOCATE) &&
> +        drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {

Do we really need to add the BDRV_REQ_ALLOCATE check here?  If the
caller specifies that flag, then we won't invalidate it by adding the
BDRV_REQ_ZERO_WRITE flag (as long as we don't add BDRV_REQ_MAY_UNMAP).

>          flags |= BDRV_REQ_ZERO_WRITE;
>          if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
>              flags |= BDRV_REQ_MAY_UNMAP;
> @@ -1593,6 +1593,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
>  
>      assert(flags & BDRV_REQ_ZERO_WRITE);
>      if (head_padding_bytes || tail_padding_bytes) {
> +        if (flags & BDRV_REQ_ALLOCATE) {
> +            return -ENOTSUP;
> +        }
>          buf = qemu_blockalign(bs, align);
>          iov = (struct iovec) {
>              .iov_base   = buf,
> @@ -1693,6 +1696,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>          return ret;
>      }
>  
> +    /* allocation request with qiov provided doesn't make much sense */
> +    assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
> +

So I suppose the use of BDRV_REQ_ALLOCATE necessitates the use of
BDRV_REQ_ZERO_WRITE?  That should be documented, then.

Max

>      bdrv_inc_in_flight(bs);
>      /*
>       * Align write if necessary by performing a read-modify-write cycle.
> @@ -1822,6 +1828,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
>  {
>      trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
>  
> +    assert(!((flags & BDRV_REQ_MAY_UNMAP) && (flags & BDRV_REQ_ALLOCATE)));
> +
> +    if ((flags & BDRV_REQ_ALLOCATE) &&
> +        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
> +    {
> +        return -ENOTSUP;
> +    }
> +
>      if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
>          flags &= ~BDRV_REQ_MAY_UNMAP;
>      }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2018-01-29 19:48   ` Max Reitz
  2018-01-30 12:36     ` Anton Nefedov
  2018-01-31 15:11   ` Alberto Garcia
  1 sibling, 1 reply; 38+ messages in thread
From: Max Reitz @ 2018-01-29 19:48 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> The idea is that ALLOCATE requests may overlap with other requests.
> Reuse the existing block layer infrastructure for serialising requests.
> Use the following approach:
>   - mark ALLOCATE serialising, so subsequent requests to the area wait
>   - ALLOCATE request itself must never wait if another request is in flight
>     already. Return EAGAIN, let the caller reconsider.
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block/io.c | 27 +++++++++++++++++++--------
>  1 file changed, 19 insertions(+), 8 deletions(-)

The basic principle looks good to me.

> diff --git a/block/io.c b/block/io.c
> index cf2f84c..4b0d34f 100644
> --- a/block/io.c
> +++ b/block/io.c

[...]

> @@ -1717,7 +1728,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>          struct iovec head_iov;
>  
>          mark_request_serialising(&req, align);
> -        wait_serialising_requests(&req);
> +        wait_serialising_requests(&req, false);

What if someone calls bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE |
BDRV_REQ_ALLOCATE?  Then this should do exactly the same as
bdrv_co_do_zero_pwritev(), which it currently does not -- besides this
serialization, this includes returning -ENOTSUP if there is a head or
tail to write.

Max

>  
>          head_buf = qemu_blockalign(bs, align);
>          head_iov = (struct iovec) {
> @@ -1758,7 +1769,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>          bool waited;
>  
>          mark_request_serialising(&req, align);
> -        waited = wait_serialising_requests(&req);
> +        waited = wait_serialising_requests(&req, false);
>          assert(!waited || !use_local_qiov);
>  
>          tail_buf = qemu_blockalign(bs, align);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 5/9] file-posix: support BDRV_REQ_ALLOCATE
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 5/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2018-01-29 19:56   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-01-29 19:56 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 365 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> Current write_zeroes implementation is good enough to satisfy this flag too
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/file-posix.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2018-01-29 19:58   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-01-29 19:58 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> Support the flag if the underlying BDS supports it
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/blkdebug.c   | 3 ++-
>  block/blkverify.c  | 2 +-
>  block/mirror.c     | 2 +-
>  block/raw-format.c | 3 ++-
>  4 files changed, 6 insertions(+), 4 deletions(-)

May need a fixup for mirror.c; and maybe quorum would be another driver
to think about (it currently doesn't handle these flags at all).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 7/9] qcow2: move is_zero() up
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 7/9] qcow2: move is_zero() up Anton Nefedov
@ 2018-01-29 19:59   ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-01-29 19:59 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 396 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> To be used in the following commit without a forward declaration.
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2.c | 35 +++++++++++++++++------------------
>  1 file changed, 17 insertions(+), 18 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2018-01-29 20:28   ` Max Reitz
  2018-01-30 14:23     ` Anton Nefedov
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2018-01-29 20:28 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 5549 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> If COW areas of the newly allocated clusters are zeroes on the backing image,
> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
> cluster instead of writing explicit zero buffers later in perform_cow().
> 
> iotest 060:
> write to the discarded cluster does not trigger COW anymore.
> Use a backing image instead.
> 
> iotest 066:
> cluster-alignment areas that were not really COWed are now detected
> as zeroes, hence the initial write has to be exactly the same size for
> the maps to match
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  qapi/block-core.json       |  4 ++-
>  block/qcow2.h              |  6 +++++
>  block/qcow2-cluster.c      |  2 +-
>  block/qcow2.c              | 66 ++++++++++++++++++++++++++++++++++++++++++++--
>  block/trace-events         |  1 +
>  tests/qemu-iotests/060     | 26 +++++++++++-------
>  tests/qemu-iotests/060.out |  5 +++-
>  tests/qemu-iotests/066     |  2 +-
>  tests/qemu-iotests/066.out |  4 +--
>  9 files changed, 98 insertions(+), 18 deletions(-)

[...]

> @@ -1875,6 +1880,52 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
>      return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
>  }
>  
> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
> +{
> +    return is_zero(bs, m->offset + m->cow_start.offset,
> +                   m->cow_start.nb_bytes) &&
> +           is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes);
> +}
> +
> +static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    QCowL2Meta *m;
> +
> +    for (m = l2meta; m != NULL; m = m->next) {
> +        int ret;
> +
> +        if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
> +            continue;
> +        }
> +
> +        if (bs->encrypted) {
> +            continue;
> +        }

Not sure if the compiler optimizes this anyway, but I'd pull this out of
the loop.

Maybe you could put all of the conditions under which this function can
actually do something at its beginning: That is, we can't do anything if
the BDS is encrypted or if bs->file does not support BDRV_REQ_ALLOCATE
(and then you just call this function unconditionally in
qcow2_co_pwritev()).

> +        if (!is_zero_cow(bs, m)) {
> +            continue;
> +        }

Is this really efficient?  I remember someone complaining about
bdrv_co_block_status() being kind of slow on some filesystems, so
there'd be a tradeoff depending on how it compares to just reading up to
two clusters from the backing file -- especially considering that the OS
can query the same information just as quickly, and thus the only
overhead the read should have is a memset() (assuming the OS is clever).

So basically my question is whether it would be better to just skip this
if we have any backing file at all and only do this optimization if
there is none.

> +
> +        BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
> +        /* instead of writing zero COW buffers,
> +           efficiently zero out the whole clusters */
> +        ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
> +                                    m->nb_clusters * s->cluster_size,
> +                                    BDRV_REQ_ALLOCATE);
> +        if (ret < 0) {
> +            if (ret != -ENOTSUP && ret != -EAGAIN) {
> +                return ret;
> +            }
> +            continue;
> +        }
> +
> +        trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
> +        m->skip_cow = true;
> +    }
> +    return 0;
> +}
> +
>  static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>                                           uint64_t bytes, QEMUIOVector *qiov,
>                                           int flags)

[...]

> diff --git a/block/trace-events b/block/trace-events
> index 11c8d5f..c9fa596 100644
> --- a/block/trace-events
> +++ b/block/trace-events
> @@ -61,6 +61,7 @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
>  qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
>  qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
>  qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
> +qcow2_skip_cow(void* co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"

Nit: Should be "void *co" to match the rest.

>  
>  # block/qcow2-cluster.c
>  qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"

[...]

> diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
> index 8638217..3c216a1 100755
> --- a/tests/qemu-iotests/066
> +++ b/tests/qemu-iotests/066
> @@ -71,7 +71,7 @@ echo
>  _make_test_img $IMG_SIZE
>  
>  # Create data clusters (not aligned to an L2 table)
> -$QEMU_IO -c 'write -P 42 1M 256k' "$TEST_IMG" | _filter_qemu_io
> +$QEMU_IO -c "write -P 42 $(((1024 + 32) * 1024)) 192k" "$TEST_IMG" | _filter_qemu_io

The comment should probably be updated to note that we don't write head
and tail because we want them to stay zero, so the mapping information
matches.

Max

>  orig_map=$($QEMU_IMG map --output=json "$TEST_IMG")
>  
>  # Convert the data clusters to preallocated zero clusters


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
@ 2018-01-29 20:30   ` Max Reitz
  2018-01-30 13:03   ` Alberto Garcia
  1 sibling, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-01-29 20:30 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 396 bytes --]

On 2018-01-18 18:49, Anton Nefedov wrote:
> COW (even empty/zero) areas require encryption too
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  tests/qemu-iotests/134     |  9 +++++++++
>  tests/qemu-iotests/134.out | 10 ++++++++++
>  2 files changed, 19 insertions(+)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags
  2018-01-29 19:26     ` Eric Blake
@ 2018-01-30 12:15       ` Anton Nefedov
  2018-01-30 16:00         ` Eric Blake
  2018-01-31 17:20         ` Max Reitz
  0 siblings, 2 replies; 38+ messages in thread
From: Anton Nefedov @ 2018-01-30 12:15 UTC (permalink / raw)
  To: Eric Blake, Max Reitz, qemu-devel; +Cc: qemu-block, kwolf, den, berto



On 29/1/2018 10:26 PM, Eric Blake wrote:
> On 01/29/2018 01:21 PM, Max Reitz wrote:
>> On 2018-01-18 18:48, Anton Nefedov wrote:
>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>   block/mirror.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index c9badc1..d18ec65 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -1064,6 +1064,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
>>>       bdrv_refresh_filename(bs->backing->bs);
>>>       pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>>>               bs->backing->bs->filename);
>>> +    bs->supported_write_flags = BDRV_REQ_FUA &
>>> +        bs->backing->bs->supported_write_flags;
>>> +    bs->supported_zero_flags =
>>> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
>>> +        bs->backing->bs->supported_zero_flags;
>>>   }
>>>   
>>>   static void bdrv_mirror_top_close(BlockDriverState *bs)
>>
>> Fundamentally OK, but why is this in *_refresh_filename()?
> 
> Indeed, I missed that (or maybe it got moved during a botched rebase?).
> For comparison, blkdebug sets it during blkdebug_open(), and nbd sets it
> during nbd_client_init() (called during nbd_open()).
> 

We need a backing bs here and I believe it's not generally set at the 
time of .bdrv_open()

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-01-29 19:37   ` Max Reitz
@ 2018-01-30 12:34     ` Anton Nefedov
  2018-01-31 17:31       ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-30 12:34 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto



On 29/1/2018 10:37 PM, Max Reitz wrote:
> On 2018-01-18 18:49, Anton Nefedov wrote:
>> The flag is supposed to indicate that the region of the disk image has
>> to be sufficiently allocated so it reads as zeroes.
>>
>> The call with the flag set must return -ENOTSUP if allocation cannot
>> be done efficiently.
>> This has to be made sure of by both
>>    - the drivers that support the flag
>>    - and the common block layer (so it will not fall back to any slowpath
>>      (like writing zero buffers) in case the driver does not support
>>      the flag).
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   include/block/block.h     |  6 +++++-
>>   include/block/block_int.h |  2 +-
>>   block/io.c                | 20 +++++++++++++++++---
>>   3 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 9b12774..3e31b89 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -65,9 +65,13 @@ typedef enum {
>>       BDRV_REQ_NO_SERIALISING     = 0x8,
>>       BDRV_REQ_FUA                = 0x10,
>>       BDRV_REQ_WRITE_COMPRESSED   = 0x20,
>> +    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
>> +     * efficiently allocate the space so it reads as zeroes, or return an error.
> 
> What happens if you specify this for a normal write operation that does
> not write zeroes?
> 
> (I suppose the answer is "don't do that", but that would need to be
> documented more clearly here.)
> 

I can't quite come up with what a regular write with ALLOCATE flag can
suppose to mean.

Will document that.

>> +     */
>> +    BDRV_REQ_ALLOCATE           = 0x40,
>>   
>>       /* Mask of valid flags */
>> -    BDRV_REQ_MASK               = 0x3f,
>> +    BDRV_REQ_MASK               = 0x7f,
>>   } BdrvRequestFlags;
>>   
>>   typedef struct BlockSizes {
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 29cafa4..b141710 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -632,7 +632,7 @@ struct BlockDriverState {
>>       /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
>>       unsigned int supported_write_flags;
>>       /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
>> -     * BDRV_REQ_MAY_UNMAP) */
>> +     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
>>       unsigned int supported_zero_flags;
>>   
>>       /* the following member gives a name to every node on the bs graph. */
>> diff --git a/block/io.c b/block/io.c
>> index 7ea4023..cf2f84c 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1424,7 +1424,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>               assert(!bs->supported_zero_flags);
>>           }
>>   
>> -        if (ret == -ENOTSUP) {
>> +        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
>>               /* Fall back to bounce buffer if write zeroes is unsupported */
>>               BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
>>   
>> @@ -1514,8 +1514,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>>       ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
>>   
>>       if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
>> -        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
>> -        qemu_iovec_is_zero(qiov)) {
>> +        !(flags & BDRV_REQ_ZERO_WRITE) && !(flags & BDRV_REQ_ALLOCATE) &&
>> +        drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {
> 
> Do we really need to add the BDRV_REQ_ALLOCATE check here?  If the
> caller specifies that flag, then we won't invalidate it by adding the
> BDRV_REQ_ZERO_WRITE flag (as long as we don't add BDRV_REQ_MAY_UNMAP).
> 

Now !(flags & BDRV_REQ_ALLOCATE) is always true here, as REQ_ALLOCATE
implies REQ_ZERO_WRITE.
But conceptually yes I think the check should only forbid setting
MAY_UNMAP.

Offtop: does REQ_ZERO_WRITE override REQ_WRITE_COMPRESSED in this
function? at least with !REQ_MAY_UNMAP it looks wrong

>>           flags |= BDRV_REQ_ZERO_WRITE;
>>           if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
>>               flags |= BDRV_REQ_MAY_UNMAP;
>> @@ -1593,6 +1593,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
>>   
>>       assert(flags & BDRV_REQ_ZERO_WRITE);
>>       if (head_padding_bytes || tail_padding_bytes) {
>> +        if (flags & BDRV_REQ_ALLOCATE) {
>> +            return -ENOTSUP;
>> +        }
>>           buf = qemu_blockalign(bs, align);
>>           iov = (struct iovec) {
>>               .iov_base   = buf,
>> @@ -1693,6 +1696,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>>           return ret;
>>       }
>>   
>> +    /* allocation request with qiov provided doesn't make much sense */
>> +    assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
>> +
> 
> So I suppose the use of BDRV_REQ_ALLOCATE necessitates the use of
> BDRV_REQ_ZERO_WRITE?  That should be documented, then.
> 
> Max
> 

Yes, will document.

>>       bdrv_inc_in_flight(bs);
>>       /*
>>        * Align write if necessary by performing a read-modify-write cycle.
>> @@ -1822,6 +1828,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
>>   {
>>       trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
>>   
>> +    assert(!((flags & BDRV_REQ_MAY_UNMAP) && (flags & BDRV_REQ_ALLOCATE)));
>> +
>> +    if ((flags & BDRV_REQ_ALLOCATE) &&
>> +        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
>> +    {
>> +        return -ENOTSUP;
>> +    }
>> +
>>       if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
>>           flags &= ~BDRV_REQ_MAY_UNMAP;
>>       }
>>
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-01-29 19:48   ` Max Reitz
@ 2018-01-30 12:36     ` Anton Nefedov
  2018-01-31 17:35       ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-30 12:36 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto



On 29/1/2018 10:48 PM, Max Reitz wrote:
> On 2018-01-18 18:49, Anton Nefedov wrote:
>> The idea is that ALLOCATE requests may overlap with other requests.
>> Reuse the existing block layer infrastructure for serialising requests.
>> Use the following approach:
>>    - mark ALLOCATE serialising, so subsequent requests to the area wait
>>    - ALLOCATE request itself must never wait if another request is in flight
>>      already. Return EAGAIN, let the caller reconsider.
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>   block/io.c | 27 +++++++++++++++++++--------
>>   1 file changed, 19 insertions(+), 8 deletions(-)
> 
> The basic principle looks good to me.
> 
>> diff --git a/block/io.c b/block/io.c
>> index cf2f84c..4b0d34f 100644
>> --- a/block/io.c
>> +++ b/block/io.c
> 
> [...]
> 
>> @@ -1717,7 +1728,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>>           struct iovec head_iov;
>>   
>>           mark_request_serialising(&req, align);
>> -        wait_serialising_requests(&req);
>> +        wait_serialising_requests(&req, false);
> 
> What if someone calls bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE |
> BDRV_REQ_ALLOCATE?  

Either

     assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));

will fail or bdrv_co_do_zero_pwritev() will be used.

> .. Then this should do exactly the same as
> bdrv_co_do_zero_pwritev(), which it currently does not -- besides this
> serialization, this includes returning -ENOTSUP if there is a head or
> tail to write.
> 

Another question is if that assertion is ok.
In other words: should (qiov!=NULL && REQ_ALLOCATE) be a valid case?
e.g. with qiov filled with zeroes?

I'd rather document that not supported (and leave the assertion).

Actually, even (qiov!=NULL && REQ_ZERO_WRITE) looks kind of
unsupported/broken? Alignment code in bdrv_co_pwritev() zeroes out the
head and tail by passing the flag down bdrv_aligned_pwritev()

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
  2018-01-29 20:30   ` Max Reitz
@ 2018-01-30 13:03   ` Alberto Garcia
  1 sibling, 0 replies; 38+ messages in thread
From: Alberto Garcia @ 2018-01-30 13:03 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Thu 18 Jan 2018 06:49:07 PM CET, Anton Nefedov wrote:
> COW (even empty/zero) areas require encryption too
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-01-29 20:28   ` Max Reitz
@ 2018-01-30 14:23     ` Anton Nefedov
  2018-01-31 17:40       ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-30 14:23 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto



On 29/1/2018 11:28 PM, Max Reitz wrote:
> On 2018-01-18 18:49, Anton Nefedov wrote:
>> If COW areas of the newly allocated clusters are zeroes on the backing image,
>> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
>> cluster instead of writing explicit zero buffers later in perform_cow().
>>
>> iotest 060:
>> write to the discarded cluster does not trigger COW anymore.
>> Use a backing image instead.
>>
>> iotest 066:
>> cluster-alignment areas that were not really COWed are now detected
>> as zeroes, hence the initial write has to be exactly the same size for
>> the maps to match
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> ---
>>   qapi/block-core.json       |  4 ++-
>>   block/qcow2.h              |  6 +++++
>>   block/qcow2-cluster.c      |  2 +-
>>   block/qcow2.c              | 66 ++++++++++++++++++++++++++++++++++++++++++++--
>>   block/trace-events         |  1 +
>>   tests/qemu-iotests/060     | 26 +++++++++++-------
>>   tests/qemu-iotests/060.out |  5 +++-
>>   tests/qemu-iotests/066     |  2 +-
>>   tests/qemu-iotests/066.out |  4 +--
>>   9 files changed, 98 insertions(+), 18 deletions(-)
> 
> [...]
> 
>> @@ -1875,6 +1880,52 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
>>       return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
>>   }
>>   
>> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
>> +{
>> +    return is_zero(bs, m->offset + m->cow_start.offset,
>> +                   m->cow_start.nb_bytes) &&
>> +           is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes);
>> +}
>> +
>> +static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>> +{
>> +    BDRVQcow2State *s = bs->opaque;
>> +    QCowL2Meta *m;
>> +
>> +    for (m = l2meta; m != NULL; m = m->next) {
>> +        int ret;
>> +
>> +        if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
>> +            continue;
>> +        }
>> +
>> +        if (bs->encrypted) {
>> +            continue;
>> +        }
> 
> Not sure if the compiler optimizes this anyway, but I'd pull this out of
> the loop.
> 

An imprint of the following patches (which were dropped from this
series) - preallocation ahead of image EOF, which takes action
regardless of image encryption.

But I'll leave the check outside the loop until it's needed
there.


> Maybe you could put all of the conditions under which this function can
> actually do something at its beginning: That is, we can't do anything if
> the BDS is encrypted or if bs->file does not support BDRV_REQ_ALLOCATE
> (and then you just call this function unconditionally in
> qcow2_co_pwritev()).
> 

Done.

>> +        if (!is_zero_cow(bs, m)) {
>> +            continue;
>> +        }
> 
> Is this really efficient?  I remember someone complaining about
> bdrv_co_block_status() being kind of slow on some filesystems, so
> there'd be a tradeoff depending on how it compares to just reading up to
> two clusters from the backing file -- especially considering that the OS
> can query the same information just as quickly, and thus the only
> overhead the read should have is a memset() (assuming the OS is clever).
> 
> So basically my question is whether it would be better to just skip this
> if we have any backing file at all and only do this optimization if
> there is none.
> 

So what we trade between is
(read+write) vs (lseek+fallocate or lseek+read+write).

Indeed if it comes to lseek the profit is smaller, and we're probably
unlikely to find a hole anyway.

Maybe it's good enough to cover these cases:
  1. no backing
  2. beyond bdrv_getlength() in backing
  3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength()
                                           in backing->file')

1 & 2 are easy to check;
3: if that's not too hacky maybe we can do the bdrv_is_allocated() check
for qcow2 exclusively and if there is raw (or any other format) backing
image - do the COW

>> +
>> +        BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
>> +        /* instead of writing zero COW buffers,
>> +           efficiently zero out the whole clusters */
>> +        ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
>> +                                    m->nb_clusters * s->cluster_size,
>> +                                    BDRV_REQ_ALLOCATE);
>> +        if (ret < 0) {
>> +            if (ret != -ENOTSUP && ret != -EAGAIN) {
>> +                return ret;
>> +            }
>> +            continue;
>> +        }
>> +
>> +        trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
>> +        m->skip_cow = true;
>> +    }
>> +    return 0;
>> +}
>> +
>>   static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
>>                                            uint64_t bytes, QEMUIOVector *qiov,
>>                                            int flags)
> 
> [...]
> 
>> diff --git a/block/trace-events b/block/trace-events
>> index 11c8d5f..c9fa596 100644
>> --- a/block/trace-events
>> +++ b/block/trace-events
>> @@ -61,6 +61,7 @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
>>   qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
>>   qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
>>   qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
>> +qcow2_skip_cow(void* co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"
> 
> Nit: Should be "void *co" to match the rest.
> 

apparently checkpatch.pl doesn't cover events files :) thanks, done

>>   
>>   # block/qcow2-cluster.c
>>   qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
> 
> [...]
> 
>> diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
>> index 8638217..3c216a1 100755
>> --- a/tests/qemu-iotests/066
>> +++ b/tests/qemu-iotests/066
>> @@ -71,7 +71,7 @@ echo
>>   _make_test_img $IMG_SIZE
>>   
>>   # Create data clusters (not aligned to an L2 table)
>> -$QEMU_IO -c 'write -P 42 1M 256k' "$TEST_IMG" | _filter_qemu_io
>> +$QEMU_IO -c "write -P 42 $(((1024 + 32) * 1024)) 192k" "$TEST_IMG" | _filter_qemu_io
> 
> The comment should probably be updated to note that we don't write head
> and tail because we want them to stay zero, so the mapping information
> matches.
> 
> Max
> 

Done.

>>   orig_map=$($QEMU_IMG map --output=json "$TEST_IMG")
>>   
>>   # Convert the data clusters to preallocated zero clusters
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags
  2018-01-30 12:15       ` Anton Nefedov
@ 2018-01-30 16:00         ` Eric Blake
  2018-01-31 17:20         ` Max Reitz
  1 sibling, 0 replies; 38+ messages in thread
From: Eric Blake @ 2018-01-30 16:00 UTC (permalink / raw)
  To: Anton Nefedov, Max Reitz, qemu-devel; +Cc: qemu-block, kwolf, den, berto

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On 01/30/2018 06:15 AM, Anton Nefedov wrote:

>>>> @@ -1064,6 +1064,11 @@ static void
>>>> bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
>>>>       bdrv_refresh_filename(bs->backing->bs);
>>>>       pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>>>>               bs->backing->bs->filename);
>>>> +    bs->supported_write_flags = BDRV_REQ_FUA &
>>>> +        bs->backing->bs->supported_write_flags;

>>> Fundamentally OK, but why is this in *_refresh_filename()?
>>
>> Indeed, I missed that (or maybe it got moved during a botched rebase?).
>> For comparison, blkdebug sets it during blkdebug_open(), and nbd sets it
>> during nbd_client_init() (called during nbd_open()).
>>
> 
> We need a backing bs here and I believe it's not generally set at the
> time of .bdrv_open()

Then is mirror_start_job() a better location, right after we call
bdrv_new_open_driver()?  (Maybe this just goes to show I haven't fully
traced the lifecycle of the mirror driver, and it may all be changing
anyways as we try to fix the BDS graph modifications related with mirrors).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
  2018-01-29 19:48   ` Max Reitz
@ 2018-01-31 15:11   ` Alberto Garcia
  2018-01-31 17:11     ` Anton Nefedov
  1 sibling, 1 reply; 38+ messages in thread
From: Alberto Garcia @ 2018-01-31 15:11 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Thu 18 Jan 2018 06:49:02 PM CET, Anton Nefedov wrote:

> -static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
> +static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
> +                                                   bool nowait)

It's a bit confusing to have a function called wait_foo() with a
parameter that says "don't wait"...

How about

     check_serialising_requests(BdrvTrackedRequest *self, bool wait)

> -    waited = wait_serialising_requests(req);
> +    waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
> +    if (waited && flags & BDRV_REQ_ALLOCATE) {
> +        return -EAGAIN;
> +    }

I find this more readable (even if not strictly necessary):

       if (waited && (flags & BDRV_REQ_ALLOCATE)) {

None of my two comments are blockers, though, so

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-01-31 15:11   ` Alberto Garcia
@ 2018-01-31 17:11     ` Anton Nefedov
  2018-02-01 14:01       ` Alberto Garcia
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-01-31 17:11 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den



On 31/1/2018 6:11 PM, Alberto Garcia wrote:
> On Thu 18 Jan 2018 06:49:02 PM CET, Anton Nefedov wrote:
> 
>> -static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>> +static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
>> +                                                   bool nowait)
> 
> It's a bit confusing to have a function called wait_foo() with a
> parameter that says "don't wait"...
> 
> How about
> 
>       check_serialising_requests(BdrvTrackedRequest *self, bool wait)
> 

I think it might be more important to emphasize in the name that the
function _might_ wait.

i.e. it feels worse to read
   check_serialising_requests(req, true);
when one needs to follow the function to find out that it might yield.

Personally I'd vote for

     static int check_or_wait_serialising_requests(
         BdrvTrackedRequest *self, bool wait) {}

and maybe even:

     static int check_serialising_requests(BdrvTrackedRequest *self) {
         return check_or_wait_serialising_requests(self, false);

     static int wait_serialising_requests(BdrvTrackedRequest *self) {
         return check_or_wait_serialising_requests(self, true);
     }

>> -    waited = wait_serialising_requests(req);
>> +    waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
>> +    if (waited && flags & BDRV_REQ_ALLOCATE) {
>> +        return -EAGAIN;
>> +    }
> 
> I find this more readable (even if not strictly necessary):
> 
>         if (waited && (flags & BDRV_REQ_ALLOCATE)) {
> 

Done!

> None of my two comments are blockers, though, so
> 
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> 
> Berto
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags
  2018-01-30 12:15       ` Anton Nefedov
  2018-01-30 16:00         ` Eric Blake
@ 2018-01-31 17:20         ` Max Reitz
  1 sibling, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-01-31 17:20 UTC (permalink / raw)
  To: Anton Nefedov, Eric Blake, qemu-devel; +Cc: qemu-block, kwolf, den, berto

[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]

On 2018-01-30 13:15, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 10:26 PM, Eric Blake wrote:
>> On 01/29/2018 01:21 PM, Max Reitz wrote:
>>> On 2018-01-18 18:48, Anton Nefedov wrote:
>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>>> ---
>>>>   block/mirror.c | 5 +++++
>>>>   1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index c9badc1..d18ec65 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -1064,6 +1064,11 @@ static void
>>>> bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
>>>>       bdrv_refresh_filename(bs->backing->bs);
>>>>       pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
>>>>               bs->backing->bs->filename);
>>>> +    bs->supported_write_flags = BDRV_REQ_FUA &
>>>> +        bs->backing->bs->supported_write_flags;
>>>> +    bs->supported_zero_flags =
>>>> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
>>>> +        bs->backing->bs->supported_zero_flags;
>>>>   }
>>>>     static void bdrv_mirror_top_close(BlockDriverState *bs)
>>>
>>> Fundamentally OK, but why is this in *_refresh_filename()?
>>
>> Indeed, I missed that (or maybe it got moved during a botched rebase?).
>> For comparison, blkdebug sets it during blkdebug_open(), and nbd sets it
>> during nbd_client_init() (called during nbd_open()).
>>
> 
> We need a backing bs here and I believe it's not generally set at the
> time of .bdrv_open().

Well, but *_refresh_filename() is just wrong.

This driver doesn't have .bdrv_open() at all, and we create it fully
manually in mirror_start_job() anyway (as Eric has said).  Therefore, we
can just do this right after bdrv_append() there has succeeded.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-01-30 12:34     ` Anton Nefedov
@ 2018-01-31 17:31       ` Max Reitz
  2018-02-01 13:34         ` Anton Nefedov
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2018-01-31 17:31 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 5489 bytes --]

On 2018-01-30 13:34, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 10:37 PM, Max Reitz wrote:
>> On 2018-01-18 18:49, Anton Nefedov wrote:
>>> The flag is supposed to indicate that the region of the disk image has
>>> to be sufficiently allocated so it reads as zeroes.
>>>
>>> The call with the flag set must return -ENOTSUP if allocation cannot
>>> be done efficiently.
>>> This has to be made sure of by both
>>>    - the drivers that support the flag
>>>    - and the common block layer (so it will not fall back to any
>>> slowpath
>>>      (like writing zero buffers) in case the driver does not support
>>>      the flag).
>>>
>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>   include/block/block.h     |  6 +++++-
>>>   include/block/block_int.h |  2 +-
>>>   block/io.c                | 20 +++++++++++++++++---
>>>   3 files changed, 23 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/block/block.h b/include/block/block.h
>>> index 9b12774..3e31b89 100644
>>> --- a/include/block/block.h
>>> +++ b/include/block/block.h
>>> @@ -65,9 +65,13 @@ typedef enum {
>>>       BDRV_REQ_NO_SERIALISING     = 0x8,
>>>       BDRV_REQ_FUA                = 0x10,
>>>       BDRV_REQ_WRITE_COMPRESSED   = 0x20,
>>> +    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the
>>> driver has to
>>> +     * efficiently allocate the space so it reads as zeroes, or
>>> return an error.
>>
>> What happens if you specify this for a normal write operation that does
>> not write zeroes?
>>
>> (I suppose the answer is "don't do that", but that would need to be
>> documented more clearly here.)
>>
> 
> I can't quite come up with what a regular write with ALLOCATE flag can
> suppose to mean.

It could mean that when zero detection is active, that zero range will
be allocated.  But considering ALLOCATE explicitly means "do not write
data", it probably doesn't make sense for data writes in general.

> Will document that.

Thanks!

>>> +     */
>>> +    BDRV_REQ_ALLOCATE           = 0x40,
>>>         /* Mask of valid flags */
>>> -    BDRV_REQ_MASK               = 0x3f,
>>> +    BDRV_REQ_MASK               = 0x7f,
>>>   } BdrvRequestFlags;
>>>     typedef struct BlockSizes {
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 29cafa4..b141710 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -632,7 +632,7 @@ struct BlockDriverState {
>>>       /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
>>>       unsigned int supported_write_flags;
>>>       /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
>>> -     * BDRV_REQ_MAY_UNMAP) */
>>> +     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
>>>       unsigned int supported_zero_flags;
>>>         /* the following member gives a name to every node on the bs
>>> graph. */
>>> diff --git a/block/io.c b/block/io.c
>>> index 7ea4023..cf2f84c 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1424,7 +1424,7 @@ static int coroutine_fn
>>> bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>>               assert(!bs->supported_zero_flags);
>>>           }
>>>   -        if (ret == -ENOTSUP) {
>>> +        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
>>>               /* Fall back to bounce buffer if write zeroes is
>>> unsupported */
>>>               BdrvRequestFlags write_flags = flags &
>>> ~BDRV_REQ_ZERO_WRITE;
>>>   @@ -1514,8 +1514,8 @@ static int coroutine_fn
>>> bdrv_aligned_pwritev(BdrvChild *child,
>>>       ret =
>>> notifier_with_return_list_notify(&bs->before_write_notifiers, req);
>>>         if (!ret && bs->detect_zeroes !=
>>> BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
>>> -        !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
>>> -        qemu_iovec_is_zero(qiov)) {
>>> +        !(flags & BDRV_REQ_ZERO_WRITE) && !(flags &
>>> BDRV_REQ_ALLOCATE) &&
>>> +        drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {
>>
>> Do we really need to add the BDRV_REQ_ALLOCATE check here?  If the
>> caller specifies that flag, then we won't invalidate it by adding the
>> BDRV_REQ_ZERO_WRITE flag (as long as we don't add BDRV_REQ_MAY_UNMAP).
>>
> 
> Now !(flags & BDRV_REQ_ALLOCATE) is always true here, as REQ_ALLOCATE
> implies REQ_ZERO_WRITE.
> But conceptually yes I think the check should only forbid setting
> MAY_UNMAP.
> 
> Offtop: does REQ_ZERO_WRITE override REQ_WRITE_COMPRESSED in this
> function? at least with !REQ_MAY_UNMAP it looks wrong

Looks like zero detection will indeed override compression.  I think
that was intended, but I don't even have an opinion either way.

Of course, it wouldn't be so nice if you tried to compress something and
then, because the zero write failed, you actually write uncompressed
zeroes...  But since zero detection is an optional feature, it might be
your own fault if you enable it when you want compression anyway, and if
you write to some format/protocol combination that doesn't allow zero
writes.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-01-30 12:36     ` Anton Nefedov
@ 2018-01-31 17:35       ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-01-31 17:35 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 2580 bytes --]

On 2018-01-30 13:36, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 10:48 PM, Max Reitz wrote:
>> On 2018-01-18 18:49, Anton Nefedov wrote:
>>> The idea is that ALLOCATE requests may overlap with other requests.
>>> Reuse the existing block layer infrastructure for serialising requests.
>>> Use the following approach:
>>>    - mark ALLOCATE serialising, so subsequent requests to the area wait
>>>    - ALLOCATE request itself must never wait if another request is in
>>> flight
>>>      already. Return EAGAIN, let the caller reconsider.
>>>
>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>   block/io.c | 27 +++++++++++++++++++--------
>>>   1 file changed, 19 insertions(+), 8 deletions(-)
>>
>> The basic principle looks good to me.
>>
>>> diff --git a/block/io.c b/block/io.c
>>> index cf2f84c..4b0d34f 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>
>> [...]
>>
>>> @@ -1717,7 +1728,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
>>>           struct iovec head_iov;
>>>             mark_request_serialising(&req, align);
>>> -        wait_serialising_requests(&req);
>>> +        wait_serialising_requests(&req, false);
>>
>> What if someone calls bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE |
>> BDRV_REQ_ALLOCATE?  
> 
> Either
> 
>     assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
> 
> will fail or bdrv_co_do_zero_pwritev() will be used.

Ah, right, I didn't see that condition there.

>> .. Then this should do exactly the same as
>> bdrv_co_do_zero_pwritev(), which it currently does not -- besides this
>> serialization, this includes returning -ENOTSUP if there is a head or
>> tail to write.
>>
> 
> Another question is if that assertion is ok.
> In other words: should (qiov!=NULL && REQ_ALLOCATE) be a valid case?
> e.g. with qiov filled with zeroes?

I think it's OK to leave the assertion that way.  But maybe move it
down, under the bdrv_co_do_zero_pwritev() call (and then omit the qiov
!= NULL, because that's guaranteed then)?

(But maybe not everyone's as blind as me.)

> I'd rather document that not supported (and leave the assertion).
> 
> Actually, even (qiov!=NULL && REQ_ZERO_WRITE) looks kind of
> unsupported/broken? Alignment code in bdrv_co_pwritev() zeroes out the
> head and tail by passing the flag down bdrv_aligned_pwritev()

Yes.  Maybe we should have an assertion that you aren't allowed to pass
a qiov with REQ_ZERO_WRITE...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-01-30 14:23     ` Anton Nefedov
@ 2018-01-31 17:40       ` Max Reitz
  2018-01-31 18:32         ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2018-01-31 17:40 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 4946 bytes --]

On 2018-01-30 15:23, Anton Nefedov wrote:
> 
> 
> On 29/1/2018 11:28 PM, Max Reitz wrote:
>> On 2018-01-18 18:49, Anton Nefedov wrote:
>>> If COW areas of the newly allocated clusters are zeroes on the
>>> backing image,
>>> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on
>>> the whole
>>> cluster instead of writing explicit zero buffers later in perform_cow().
>>>
>>> iotest 060:
>>> write to the discarded cluster does not trigger COW anymore.
>>> Use a backing image instead.
>>>
>>> iotest 066:
>>> cluster-alignment areas that were not really COWed are now detected
>>> as zeroes, hence the initial write has to be exactly the same size for
>>> the maps to match
>>>
>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>> ---
>>>   qapi/block-core.json       |  4 ++-
>>>   block/qcow2.h              |  6 +++++
>>>   block/qcow2-cluster.c      |  2 +-
>>>   block/qcow2.c              | 66
>>> ++++++++++++++++++++++++++++++++++++++++++++--
>>>   block/trace-events         |  1 +
>>>   tests/qemu-iotests/060     | 26 +++++++++++-------
>>>   tests/qemu-iotests/060.out |  5 +++-
>>>   tests/qemu-iotests/066     |  2 +-
>>>   tests/qemu-iotests/066.out |  4 +--
>>>   9 files changed, 98 insertions(+), 18 deletions(-)
>>
>> [...]
>>
>>> @@ -1875,6 +1880,52 @@ static bool is_zero(BlockDriverState *bs,
>>> int64_t offset, int64_t bytes)
>>>       return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
>>>   }
>>>   +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
>>> +{
>>> +    return is_zero(bs, m->offset + m->cow_start.offset,
>>> +                   m->cow_start.nb_bytes) &&
>>> +           is_zero(bs, m->offset + m->cow_end.offset,
>>> m->cow_end.nb_bytes);
>>> +}
>>> +
>>> +static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>>> +{
>>> +    BDRVQcow2State *s = bs->opaque;
>>> +    QCowL2Meta *m;
>>> +
>>> +    for (m = l2meta; m != NULL; m = m->next) {
>>> +        int ret;
>>> +
>>> +        if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
>>> +            continue;
>>> +        }
>>> +
>>> +        if (bs->encrypted) {
>>> +            continue;
>>> +        }
>>
>> Not sure if the compiler optimizes this anyway, but I'd pull this out of
>> the loop.
>>
> 
> An imprint of the following patches (which were dropped from this
> series) - preallocation ahead of image EOF, which takes action
> regardless of image encryption.
> 
> But I'll leave the check outside the loop until it's needed
> there.
> 
> 
>> Maybe you could put all of the conditions under which this function can
>> actually do something at its beginning: That is, we can't do anything if
>> the BDS is encrypted or if bs->file does not support BDRV_REQ_ALLOCATE
>> (and then you just call this function unconditionally in
>> qcow2_co_pwritev()).
>>
> 
> Done.
> 
>>> +        if (!is_zero_cow(bs, m)) {
>>> +            continue;
>>> +        }
>>
>> Is this really efficient?  I remember someone complaining about
>> bdrv_co_block_status() being kind of slow on some filesystems, so
>> there'd be a tradeoff depending on how it compares to just reading up to
>> two clusters from the backing file -- especially considering that the OS
>> can query the same information just as quickly, and thus the only
>> overhead the read should have is a memset() (assuming the OS is clever).
>>
>> So basically my question is whether it would be better to just skip this
>> if we have any backing file at all and only do this optimization if
>> there is none.
>>
> 
> So what we trade between is
> (read+write) vs (lseek+fallocate or lseek+read+write).
> 
> Indeed if it comes to lseek the profit is smaller, and we're probably
> unlikely to find a hole anyway.
> 
> Maybe it's good enough to cover these cases:
>  1. no backing
>  2. beyond bdrv_getlength() in backing
>  3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength()
>                                           in backing->file')
> 
> 1 & 2 are easy to check;

Not sure how useful 2 is, though.  (I don't know.  I always hear about
people wanting to optimize for such a case where a backing file is
shorter than the overlay, but I can't imagine a real use case for that.)

> 3: if that's not too hacky maybe we can do the bdrv_is_allocated() check
> for qcow2 exclusively and if there is raw (or any other format) backing
> image - do the COW

Yes, well, it seems useful in principle...  But it would require general
block layer work indeed.  Maybe a new flag for bdrv_block_status*() that
says only to look into the format layer?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-01-31 17:40       ` Max Reitz
@ 2018-01-31 18:32         ` Eric Blake
  2018-01-31 18:35           ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2018-01-31 18:32 UTC (permalink / raw)
  To: Max Reitz, Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, den, berto

[-- Attachment #1: Type: text/plain, Size: 1716 bytes --]

On 01/31/2018 11:40 AM, Max Reitz wrote:

>> Maybe it's good enough to cover these cases:
>>  1. no backing
>>  2. beyond bdrv_getlength() in backing
>>  3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength()
>>                                           in backing->file')
>>
>> 1 & 2 are easy to check;
> 
> Not sure how useful 2 is, though.  (I don't know.  I always hear about
> people wanting to optimize for such a case where a backing file is
> shorter than the overlay, but I can't imagine a real use case for that.)

I can; here's what's happened to me personally.  I created an image, and
took internal snapshots (yeah, I know those aren't in fashion these
days, but this was long ago).  Later, I ran out of space.  I wanted to
resize the image, but am not convinced whether resizing the image will
play nicely with the internal snapshots (in fact, I don't recall
off-hand whether this was something we prevented in the past and now
support, or if it is still unsupported now...) - so the easiest way is
to create a larger overlay image.

> 
>> 3: if that's not too hacky maybe we can do the bdrv_is_allocated() check
>> for qcow2 exclusively and if there is raw (or any other format) backing
>> image - do the COW
> 
> Yes, well, it seems useful in principle...  But it would require general
> block layer work indeed.  Maybe a new flag for bdrv_block_status*() that
> says only to look into the format layer?

Maybe indeed - but first I want my byte-based block_status stuff to land ;)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-01-31 18:32         ` Eric Blake
@ 2018-01-31 18:35           ` Max Reitz
  2018-01-31 18:43             ` Eric Blake
  0 siblings, 1 reply; 38+ messages in thread
From: Max Reitz @ 2018-01-31 18:35 UTC (permalink / raw)
  To: Eric Blake, Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, den, berto

[-- Attachment #1: Type: text/plain, Size: 1947 bytes --]

On 2018-01-31 19:32, Eric Blake wrote:
> On 01/31/2018 11:40 AM, Max Reitz wrote:
> 
>>> Maybe it's good enough to cover these cases:
>>>  1. no backing
>>>  2. beyond bdrv_getlength() in backing
>>>  3. unallocated in backing qcow2 (covers 'beyond bdrv_getlength()
>>>                                           in backing->file')
>>>
>>> 1 & 2 are easy to check;
>>
>> Not sure how useful 2 is, though.  (I don't know.  I always hear about
>> people wanting to optimize for such a case where a backing file is
>> shorter than the overlay, but I can't imagine a real use case for that.)
> 
> I can; here's what's happened to me personally.  I created an image, and
> took internal snapshots (yeah, I know those aren't in fashion these
> days, but this was long ago).  Later, I ran out of space.  I wanted to
> resize the image, but am not convinced whether resizing the image will
> play nicely with the internal snapshots (in fact, I don't recall
> off-hand whether this was something we prevented in the past and now
> support, or if it is still unsupported now...) - so the easiest way is
> to create a larger overlay image.

But you were convinced that creating an overlay would play nicely with
the internal snapshots? ;-)

I'm not sure whether that sounds like a use case I'd want to optimize
for, but, well.

>>> 3: if that's not too hacky maybe we can do the bdrv_is_allocated() check
>>> for qcow2 exclusively and if there is raw (or any other format) backing
>>> image - do the COW
>>
>> Yes, well, it seems useful in principle...  But it would require general
>> block layer work indeed.  Maybe a new flag for bdrv_block_status*() that
>> says only to look into the format layer?
> 
> Maybe indeed - but first I want my byte-based block_status stuff to land ;)

It could be done as an optimization in a follow-up to this series
anyway, yes.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-01-31 18:35           ` Max Reitz
@ 2018-01-31 18:43             ` Eric Blake
  0 siblings, 0 replies; 38+ messages in thread
From: Eric Blake @ 2018-01-31 18:43 UTC (permalink / raw)
  To: Max Reitz, Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, den, berto

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

On 01/31/2018 12:35 PM, Max Reitz wrote:

>>> Not sure how useful 2 is, though.  (I don't know.  I always hear about
>>> people wanting to optimize for such a case where a backing file is
>>> shorter than the overlay, but I can't imagine a real use case for that.)
>>
>> I can; here's what's happened to me personally.  I created an image, and
>> took internal snapshots (yeah, I know those aren't in fashion these
>> days, but this was long ago).  Later, I ran out of space.  I wanted to
>> resize the image, but am not convinced whether resizing the image will
>> play nicely with the internal snapshots (in fact, I don't recall
>> off-hand whether this was something we prevented in the past and now
>> support, or if it is still unsupported now...) - so the easiest way is
>> to create a larger overlay image.
> 
> But you were convinced that creating an overlay would play nicely with
> the internal snapshots? ;-)

Yes. As long as I don't mind pointing back to the original file (rather
than the overlay) at the point I attempt to revert to the internal
snapshot, then loading the snapshot doesn't have to worry about a size
change.

> 
> I'm not sure whether that sounds like a use case I'd want to optimize
> for, but, well.

I guess it boils down to whether the optimization is a low-maintenance
freebie.  There's no reason to reject the optimization if a patch
demonstrates it is easy to support and it has iotests coverage.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-01-31 17:31       ` Max Reitz
@ 2018-02-01 13:34         ` Anton Nefedov
  2018-02-01 18:06           ` Max Reitz
  0 siblings, 1 reply; 38+ messages in thread
From: Anton Nefedov @ 2018-02-01 13:34 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

On 31/1/2018 8:31 PM, Max Reitz wrote:
> On 2018-01-30 13:34, Anton Nefedov wrote:
>> Offtop: does REQ_ZERO_WRITE override REQ_WRITE_COMPRESSED in this
>> function? at least with !REQ_MAY_UNMAP it looks wrong
> 
> Looks like zero detection will indeed override compression.  I think
> that was intended, but I don't even have an opinion either way.
> 
> Of course, it wouldn't be so nice if you tried to compress something and
> then, because the zero write failed, you actually write uncompressed
> zeroes...  But since zero detection is an optional feature, it might be
> your own fault if you enable it when you want compression anyway, and if
> you write to some format/protocol combination that doesn't allow zero
> writes.
> 
> Max
> 

Is it detection only? I guess in case the caller sets
(REQ_ZERO_WRITE | REQ_COMPRESSED) it also fires here.

Looks like noone does that though, but for example backup might;
it supports compression, and it also does a zero detection itself and
calls write_zeroes() when it can.
It sets REQ_MAY_UNMAP which should indeed override a compression, but
unmap might be not supported.

/Anton

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-01-31 17:11     ` Anton Nefedov
@ 2018-02-01 14:01       ` Alberto Garcia
  0 siblings, 0 replies; 38+ messages in thread
From: Alberto Garcia @ 2018-02-01 14:01 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Wed 31 Jan 2018 06:11:27 PM CET, Anton Nefedov wrote:
> On 31/1/2018 6:11 PM, Alberto Garcia wrote:
>> On Thu 18 Jan 2018 06:49:02 PM CET, Anton Nefedov wrote:
>> 
>>> -static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
>>> +static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
>>> +                                                   bool nowait)
>> 
>> It's a bit confusing to have a function called wait_foo() with a
>> parameter that says "don't wait"...
>> 
>> How about
>> 
>>       check_serialising_requests(BdrvTrackedRequest *self, bool wait)
>> 
>
> I think it might be more important to emphasize in the name that the
> function _might_ wait.
>
> i.e. it feels worse to read
>    check_serialising_requests(req, true);
> when one needs to follow the function to find out that it might yield.
>
> Personally I'd vote for
>
>      static int check_or_wait_serialising_requests(
>          BdrvTrackedRequest *self, bool wait) {}
>
> and maybe even:
>
>      static int check_serialising_requests(BdrvTrackedRequest *self) {
>          return check_or_wait_serialising_requests(self, false);
>
>      static int wait_serialising_requests(BdrvTrackedRequest *self) {
>          return check_or_wait_serialising_requests(self, true);
>      }

You're right. Either approach works for me though, also keeping the
current solution, wait_serialising_requests(req, true).

Berto

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-02-01 13:34         ` Anton Nefedov
@ 2018-02-01 18:06           ` Max Reitz
  0 siblings, 0 replies; 38+ messages in thread
From: Max Reitz @ 2018-02-01 18:06 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, eblake, den, berto

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

On 2018-02-01 14:34, Anton Nefedov wrote:
> On 31/1/2018 8:31 PM, Max Reitz wrote:
>> On 2018-01-30 13:34, Anton Nefedov wrote:
>>> Offtop: does REQ_ZERO_WRITE override REQ_WRITE_COMPRESSED in this
>>> function? at least with !REQ_MAY_UNMAP it looks wrong
>>
>> Looks like zero detection will indeed override compression.  I think
>> that was intended, but I don't even have an opinion either way.
>>
>> Of course, it wouldn't be so nice if you tried to compress something and
>> then, because the zero write failed, you actually write uncompressed
>> zeroes...  But since zero detection is an optional feature, it might be
>> your own fault if you enable it when you want compression anyway, and if
>> you write to some format/protocol combination that doesn't allow zero
>> writes.
>>
>> Max
>>
> 
> Is it detection only? I guess in case the caller sets
> (REQ_ZERO_WRITE | REQ_COMPRESSED) it also fires here.

True, but that's the caller's "fault".  One of those things has to take
precedence.

And I've just noticed that when bdrv_co_do_pwrite_zeroes() falls back to
a bounce buffer, it passes the original flags (without
BDRV_REQ_ZERO_WRITE) to bdrv_driver_pwritev(), so compression will take
effect then.  So the only result is that we first try a zero write, and
if that fails, we do a compressed write.  That sounds reasonable to me.

(I mean, we could do it the other way around, but I firstly I don't
think it matters much and secondly it's probably better this way because
I'd suspect zero writes to be more efficient than compressing the bounce
buffer; at least that's how it is for qcow2.)

Max

> Looks like noone does that though, but for example backup might;
> it supports compression, and it also does a zero detection itself and
> calls write_zeroes() when it can.
> It sets REQ_MAY_UNMAP which should indeed override a compression, but
> unmap might be not supported.
> 
> /Anton



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-02-01 18:06 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-18 17:48 [Qemu-devel] [PATCH v7 0/9] qcow2: cluster space preallocation Anton Nefedov
2018-01-18 17:48 ` [Qemu-devel] [PATCH v7 1/9] mirror: inherit supported write/zero flags Anton Nefedov
2018-01-29 19:21   ` Max Reitz
2018-01-29 19:26     ` Eric Blake
2018-01-30 12:15       ` Anton Nefedov
2018-01-30 16:00         ` Eric Blake
2018-01-31 17:20         ` Max Reitz
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 2/9] blkverify: set " Anton Nefedov
2018-01-29 19:23   ` Max Reitz
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 3/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
2018-01-29 19:37   ` Max Reitz
2018-01-30 12:34     ` Anton Nefedov
2018-01-31 17:31       ` Max Reitz
2018-02-01 13:34         ` Anton Nefedov
2018-02-01 18:06           ` Max Reitz
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
2018-01-29 19:48   ` Max Reitz
2018-01-30 12:36     ` Anton Nefedov
2018-01-31 17:35       ` Max Reitz
2018-01-31 15:11   ` Alberto Garcia
2018-01-31 17:11     ` Anton Nefedov
2018-02-01 14:01       ` Alberto Garcia
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 5/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
2018-01-29 19:56   ` Max Reitz
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
2018-01-29 19:58   ` Max Reitz
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 7/9] qcow2: move is_zero() up Anton Nefedov
2018-01-29 19:59   ` Max Reitz
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
2018-01-29 20:28   ` Max Reitz
2018-01-30 14:23     ` Anton Nefedov
2018-01-31 17:40       ` Max Reitz
2018-01-31 18:32         ` Eric Blake
2018-01-31 18:35           ` Max Reitz
2018-01-31 18:43             ` Eric Blake
2018-01-18 17:49 ` [Qemu-devel] [PATCH v7 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
2018-01-29 20:30   ` Max Reitz
2018-01-30 13:03   ` Alberto Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.