All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation
@ 2018-03-12 10:16 Anton Nefedov
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 1/9] mirror: inherit supported write/zero flags Anton Nefedov
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

v8: - patch 1: flags setup moved to a proper place
    - new patch 3 for quorum driver
    - patch 4: expanded the flag description and minor relocations of sanity
               checks
    - patch 5: function renamed
    - patch 6: minor rebase-related changes
    - patch 7: quorum added
    - patch 8: use bdrv_is_allocated interface, which allows a less strict
               zero detection thanks to block_status 'want_zero' flag.
               Other minor changes (encrypted flag check moved).

v7: http://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg04470.html

Anton Nefedov (9):
  mirror: inherit supported write/zero flags
  blkverify: set supported write/zero flags
  quorum: set supported write/zero flags
  block: introduce BDRV_REQ_ALLOCATE flag
  block: treat BDRV_REQ_ALLOCATE as serialising
  file-posix: support BDRV_REQ_ALLOCATE
  block: support BDRV_REQ_ALLOCATE in passthrough drivers
  qcow2: skip writing zero buffers to empty COW areas
  iotest 134: test cluster-misaligned encrypted write

 qapi/block-core.json       |  4 ++-
 block/qcow2.h              |  6 ++++
 include/block/block.h      |  8 ++++-
 include/block/block_int.h  |  2 +-
 block/blkdebug.c           |  3 +-
 block/blkverify.c          |  9 ++++++
 block/file-posix.c         | 10 +++++-
 block/io.c                 | 50 +++++++++++++++++++++++------
 block/mirror.c             | 11 +++++++
 block/qcow2-cluster.c      |  2 +-
 block/qcow2.c              | 79 ++++++++++++++++++++++++++++++++++++++++++++--
 block/quorum.c             | 21 ++++++++++++
 block/raw-format.c         |  3 +-
 block/trace-events         |  1 +
 tests/qemu-iotests/060     | 26 +++++++++------
 tests/qemu-iotests/060.out |  5 ++-
 tests/qemu-iotests/134     |  9 ++++++
 tests/qemu-iotests/134.out | 10 ++++++
 18 files changed, 230 insertions(+), 29 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 1/9] mirror: inherit supported write/zero flags
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-16 10:35   ` Alberto Garcia
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 2/9] blkverify: set " Anton Nefedov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/mirror.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index f5bf620..2fb786f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1100,6 +1100,15 @@ static BlockDriver bdrv_mirror_top = {
     .bdrv_child_perm            = bdrv_mirror_top_child_perm,
 };
 
+static void mirror_top_set_supported_flags(BlockDriverState *bs)
+{
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->backing->bs->supported_write_flags;
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->backing->bs->supported_zero_flags;
+}
+
 static void mirror_start_job(const char *job_id, BlockDriverState *bs,
                              int creation_flags, BlockDriverState *target,
                              const char *replaces, int64_t speed,
@@ -1165,6 +1174,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
         return;
     }
 
+    mirror_top_set_supported_flags(mirror_top_bs);
+
     /* Make sure that the source is not resized while the job is running */
     s = block_job_create(job_id, driver, mirror_top_bs,
                          BLK_PERM_CONSISTENT_READ,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 2/9] blkverify: set supported write/zero flags
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 1/9] mirror: inherit supported write/zero flags Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 3/9] quorum: " Anton Nefedov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/blkverify.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/blkverify.c b/block/blkverify.c
index 331365b..de2fdc1 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -141,6 +141,15 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    bs->supported_write_flags = BDRV_REQ_FUA &
+        bs->file->bs->supported_write_flags &
+        s->test_file->bs->supported_write_flags;
+
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        bs->file->bs->supported_zero_flags &
+        s->test_file->bs->supported_zero_flags;
+
     ret = 0;
 fail:
     qemu_opts_del(opts);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 3/9] quorum: set supported write/zero flags
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 1/9] mirror: inherit supported write/zero flags Anton Nefedov
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 2/9] blkverify: set " Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-16 10:39   ` Alberto Garcia
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 4/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/quorum.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 14333c1..dc77a23 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -867,6 +867,20 @@ static QemuOptsList quorum_runtime_opts = {
     },
 };
 
+static void quorum_set_supported_flags(BlockDriverState *bs)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i;
+
+    bs->supported_write_flags = BDRV_REQ_FUA;
+    bs->supported_zero_flags  = BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP;
+
+    for (i = 0; i < s->num_children; i++) {
+        bs->supported_write_flags &= s->children[i]->bs->supported_write_flags;
+        bs->supported_zero_flags  &= s->children[i]->bs->supported_zero_flags;
+    }
+}
+
 static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
                        Error **errp)
 {
@@ -961,6 +975,8 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
     s->next_child_index = s->num_children;
 
+    quorum_set_supported_flags(bs);
+
     g_free(opened);
     goto exit;
 
@@ -1029,6 +1045,8 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
     s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
     s->children[s->num_children++] = child;
 
+    quorum_set_supported_flags(bs);
+
 out:
     bdrv_drained_end(bs);
 }
@@ -1064,6 +1082,8 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
     bdrv_unref_child(bs, child);
 
     bdrv_drained_end(bs);
+
+    quorum_set_supported_flags(bs);
 }
 
 static void quorum_refresh_filename(BlockDriverState *bs, QDict *options)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 4/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (2 preceding siblings ...)
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 3/9] quorum: " Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-16 11:01   ` Alberto Garcia
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 5/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

The flag is supposed to indicate that the region of the disk image has
to be sufficiently allocated so it reads as zeroes.

The call with the flag set must return -ENOTSUP if allocation cannot
be done efficiently.
This has to be made sure of by both
  - the drivers that support the flag
  - and the common block layer (so it will not fall back to any slowpath
    (like writing zero buffers) in case the driver does not support
    the flag).

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 include/block/block.h     |  8 +++++++-
 include/block/block_int.h |  2 +-
 block/io.c                | 18 ++++++++++++++++--
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 8b6db95..9747632 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -53,9 +53,15 @@ typedef enum {
     BDRV_REQ_NO_SERIALISING     = 0x8,
     BDRV_REQ_FUA                = 0x10,
     BDRV_REQ_WRITE_COMPRESSED   = 0x20,
+    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
+     * efficiently allocate the space so it reads as zeroes, or return an error.
+     * Must be used together with BDRV_REQ_ZERO_WRITE.
+     * Contradictory to BDRV_REQ_MAY_UNMAP so the two must not be used together.
+     */
+    BDRV_REQ_ALLOCATE           = 0x40,
 
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x3f,
+    BDRV_REQ_MASK               = 0x7f,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 64a5700..9495e35 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -648,7 +648,7 @@ struct BlockDriverState {
     /* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
     unsigned int supported_write_flags;
     /* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
-     * BDRV_REQ_MAY_UNMAP) */
+     * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
     unsigned int supported_zero_flags;
 
     /* the following member gives a name to every node on the bs graph. */
diff --git a/block/io.c b/block/io.c
index 2b09c65..c4f2a07 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1418,7 +1418,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
             assert(!bs->supported_zero_flags);
         }
 
-        if (ret == -ENOTSUP) {
+        if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
             /* Fall back to bounce buffer if write zeroes is unsupported */
             BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
 
@@ -1511,7 +1511,9 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
         !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
         qemu_iovec_is_zero(qiov)) {
         flags |= BDRV_REQ_ZERO_WRITE;
-        if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
+        if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP &&
+            !(flags & BDRV_REQ_ALLOCATE))
+        {
             flags |= BDRV_REQ_MAY_UNMAP;
         }
     }
@@ -1587,6 +1589,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
     assert(flags & BDRV_REQ_ZERO_WRITE);
     if (head_padding_bytes || tail_padding_bytes) {
+        if (flags & BDRV_REQ_ALLOCATE) {
+            return -ENOTSUP;
+        }
         buf = qemu_blockalign(bs, align);
         iov = (struct iovec) {
             .iov_base   = buf,
@@ -1672,6 +1677,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
     bool use_local_qiov = false;
     int ret;
 
+    assert(!((flags & BDRV_REQ_ALLOCATE) && (flags & BDRV_REQ_MAY_UNMAP)));
+    assert(!((flags & BDRV_REQ_ALLOCATE) && !(flags & BDRV_REQ_ZERO_WRITE)));
+
     trace_bdrv_co_pwritev(child->bs, offset, bytes, flags);
 
     if (!bs->drv) {
@@ -1816,6 +1824,12 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
 {
     trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
 
+    if ((flags & BDRV_REQ_ALLOCATE) &&
+        !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
+    {
+        return -ENOTSUP;
+    }
+
     if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
         flags &= ~BDRV_REQ_MAY_UNMAP;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 5/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (3 preceding siblings ...)
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 4/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-19 11:02   ` Alberto Garcia
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 6/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

The idea is that ALLOCATE requests may overlap with other requests.
Reuse the existing block layer infrastructure for serialising requests.
Use the following approach:
  - mark ALLOCATE serialising, so subsequent requests to the area wait
  - ALLOCATE request itself must never wait if another request is in flight
    already. Return EAGAIN, let the caller reconsider.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/io.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index c4f2a07..5d74603 100644
--- a/block/io.c
+++ b/block/io.c
@@ -599,12 +599,13 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
     bdrv_wakeup(bs);
 }
 
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn find_or_wait_serialising_requests(
+    BdrvTrackedRequest *self, bool wait)
 {
     BlockDriverState *bs = self->bs;
     BdrvTrackedRequest *req;
     bool retry;
-    bool waited = false;
+    bool found = false;
 
     if (!atomic_read(&bs->serialising_in_flight)) {
         return false;
@@ -630,11 +631,14 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
                  * will wait for us as soon as it wakes up, then just go on
                  * (instead of producing a deadlock in the former case). */
                 if (!req->waiting_for) {
+                    found = true;
+                    if (!wait) {
+                        break;
+                    }
                     self->waiting_for = req;
                     qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
                     self->waiting_for = NULL;
                     retry = true;
-                    waited = true;
                     break;
                 }
             }
@@ -642,7 +646,12 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
         qemu_co_mutex_unlock(&bs->reqs_lock);
     } while (retry);
 
-    return waited;
+    return found;
+}
+
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+{
+    return find_or_wait_serialising_requests(self, true);
 }
 
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
@@ -1474,7 +1483,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
 {
     BlockDriverState *bs = child->bs;
     BlockDriver *drv = bs->drv;
-    bool waited;
+    bool found;
     int ret;
 
     int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
@@ -1498,8 +1507,13 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
     max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
                                    align);
 
-    waited = wait_serialising_requests(req);
-    assert(!waited || !req->serialising);
+    found = find_or_wait_serialising_requests(req,
+                                              !(flags & BDRV_REQ_ALLOCATE));
+    if (found && (flags & BDRV_REQ_ALLOCATE)) {
+        return -EAGAIN;
+    }
+
+    assert(!found || !req->serialising);
     assert(req->overlap_offset <= offset);
     assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
     assert(child->perm & BLK_PERM_WRITE);
@@ -1624,6 +1638,10 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
         bytes -= zero_bytes;
     }
 
+    if (flags & BDRV_REQ_ALLOCATE) {
+        mark_request_serialising(req, align);
+    }
+
     assert(!bytes || (offset & (align - 1)) == 0);
     if (bytes >= align) {
         /* Write the aligned part in the middle. */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 6/9] file-posix: support BDRV_REQ_ALLOCATE
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (4 preceding siblings ...)
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 5/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-16 14:53   ` Anton Nefedov
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 7/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Current write_zeroes implementation is good enough to satisfy this flag too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/file-posix.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 7f2cc63..2136df9 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -560,7 +560,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
     }
     if (S_ISREG(st.st_mode)) {
         s->discard_zeroes = true;
+#ifdef CONFIG_FALLOCATE
         s->has_fallocate = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
+#endif
     }
     if (S_ISBLK(st.st_mode)) {
 #ifdef BLKDISCARDZEROES
@@ -595,10 +598,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 #ifdef CONFIG_XFS
     if (platform_test_xfs_fd(s->fd)) {
         s->is_xfs = true;
+        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
     }
 #endif
 
-    bs->supported_zero_flags = s->discard_zeroes ? BDRV_REQ_MAY_UNMAP : 0;
+    bs->supported_zero_flags |= s->discard_zeroes ? BDRV_REQ_MAY_UNMAP : 0;
     ret = 0;
 fail:
     if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
@@ -1416,6 +1420,10 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
         }
         s->has_fallocate = false;
     }
+
+    if (!s->has_fallocate) {
+        aiocb->bs->supported_zero_flags &= ~BDRV_REQ_ALLOCATE;
+    }
 #endif
 
     return -ENOTSUP;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 7/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (5 preceding siblings ...)
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 6/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-16 11:03   ` Alberto Garcia
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
  8 siblings, 1 reply; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

Support the flag if the underlying BDS supports it

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/blkdebug.c   | 3 ++-
 block/blkverify.c  | 2 +-
 block/mirror.c     | 2 +-
 block/quorum.c     | 3 ++-
 block/raw-format.c | 3 ++-
 5 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 5897124..d46f6c9 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -400,7 +400,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
 
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
     ret = -EINVAL;
 
diff --git a/block/blkverify.c b/block/blkverify.c
index de2fdc1..03bec62 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -146,7 +146,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
         s->test_file->bs->supported_write_flags;
 
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags &
         s->test_file->bs->supported_zero_flags;
 
diff --git a/block/mirror.c b/block/mirror.c
index 2fb786f..58f006f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1105,7 +1105,7 @@ static void mirror_top_set_supported_flags(BlockDriverState *bs)
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->backing->bs->supported_write_flags;
     bs->supported_zero_flags =
-        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->backing->bs->supported_zero_flags;
 }
 
diff --git a/block/quorum.c b/block/quorum.c
index dc77a23..e369e79 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -873,7 +873,8 @@ static void quorum_set_supported_flags(BlockDriverState *bs)
     int i;
 
     bs->supported_write_flags = BDRV_REQ_FUA;
-    bs->supported_zero_flags  = BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP;
+    bs->supported_zero_flags  =
+        BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE;
 
     for (i = 0; i < s->num_children; i++) {
         bs->supported_write_flags &= s->children[i]->bs->supported_write_flags;
diff --git a/block/raw-format.c b/block/raw-format.c
index a378547..ca65f86 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -417,7 +417,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
     bs->sg = bs->file->bs->sg;
     bs->supported_write_flags = BDRV_REQ_FUA &
         bs->file->bs->supported_write_flags;
-    bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+    bs->supported_zero_flags =
+        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
         bs->file->bs->supported_zero_flags;
 
     if (bs->probed && !bdrv_is_read_only(bs)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (6 preceding siblings ...)
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 7/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  2018-03-16 13:58   ` Alberto Garcia
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
  8 siblings, 1 reply; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

If COW areas of the newly allocated clusters are zeroes on the backing image,
efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
cluster instead of writing explicit zero buffers later in perform_cow().

iotest 060:
write to the discarded cluster does not trigger COW anymore.
Use a backing image instead.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 qapi/block-core.json       |  4 ++-
 block/qcow2.h              |  6 ++++
 block/qcow2-cluster.c      |  2 +-
 block/qcow2.c              | 79 ++++++++++++++++++++++++++++++++++++++++++++--
 block/trace-events         |  1 +
 tests/qemu-iotests/060     | 26 +++++++++------
 tests/qemu-iotests/060.out |  5 ++-
 7 files changed, 108 insertions(+), 15 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 00475f0..da61a0d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2584,6 +2584,8 @@
 #
 # @cor_write: a write due to copy-on-read (since 2.11)
 #
+# @cluster_alloc_space: an allocation of a cluster file space (since 2.12)
+#
 # Since: 2.9
 ##
 { 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG',
@@ -2602,7 +2604,7 @@
             'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
             'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
             'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-            'cor_write'] }
+            'cor_write', 'cluster_alloc_space'] }
 
 ##
 # @BlkdebugInjectErrorOptions:
diff --git a/block/qcow2.h b/block/qcow2.h
index 1a84cc7..2cf7dfd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -379,6 +379,12 @@ typedef struct QCowL2Meta
     Qcow2COWRegion cow_end;
 
     /**
+     * Indicates that COW regions are already handled and do not require
+     * any more processing.
+     */
+    bool skip_cow;
+
+    /**
      * The I/O vector with the data from the actual guest write request.
      * If non-NULL, this is meant to be merged together with the data
      * from @cow_start and @cow_end into one single write operation.
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 98908c4..0624d65 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -805,7 +805,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->offset + start->nb_bytes <= end->offset);
     assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
-    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
+    if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->skip_cow) {
         return 0;
     }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 071dc4d..2b0ec3c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1866,6 +1866,11 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
             continue;
         }
 
+        /* If COW regions are handled already, skip this too */
+        if (m->skip_cow) {
+            continue;
+        }
+
         /* The data (middle) region must be immediately after the
          * start region */
         if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
@@ -1891,6 +1896,67 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
     return false;
 }
 
+static bool is_unallocated(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+    int64_t nr;
+    return !bytes ||
+        (!bdrv_is_allocated_above(bs, NULL, offset, bytes, &nr) && nr == bytes);
+}
+
+static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
+{
+    /* content with false negatives, giving is_allocated() is faster than
+     * a proper zero detection with possible actual image seeks, which is
+     * performed by is_zero() */
+    return is_unallocated(bs, m->offset + m->cow_start.offset,
+                          m->cow_start.nb_bytes) &&
+           is_unallocated(bs, m->offset + m->cow_end.offset,
+                          m->cow_end.nb_bytes);
+}
+
+static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+    BDRVQcow2State *s = bs->opaque;
+    QCowL2Meta *m;
+
+    if (!(bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE)) {
+        return 0;
+    }
+
+    if (bs->encrypted) {
+        return 0;
+    }
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        int ret;
+
+        if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
+            continue;
+        }
+
+        if (!is_zero_cow(bs, m)) {
+            continue;
+        }
+
+        BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_SPACE);
+        /* instead of writing zero COW buffers,
+           efficiently zero out the whole clusters */
+        ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
+                                    m->nb_clusters * s->cluster_size,
+                                    BDRV_REQ_ALLOCATE);
+        if (ret < 0) {
+            if (ret != -ENOTSUP && ret != -EAGAIN) {
+                return ret;
+            }
+            continue;
+        }
+
+        trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
+        m->skip_cow = true;
+    }
+    return 0;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -1973,24 +2039,33 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
+        qemu_co_mutex_unlock(&s->lock);
+
+        ret = handle_alloc_space(bs, l2meta);
+        if (ret < 0) {
+            qemu_co_mutex_lock(&s->lock);
+            goto fail;
+        }
+
         /* If we need to do COW, check if it's possible to merge the
          * writing of the guest data together with that of the COW regions.
          * If it's not possible (or not necessary) then write the
          * guest data now. */
         if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
-            qemu_co_mutex_unlock(&s->lock);
             BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
             trace_qcow2_writev_data(qemu_coroutine_self(),
                                     cluster_offset + offset_in_cluster);
             ret = bdrv_co_pwritev(bs->file,
                                   cluster_offset + offset_in_cluster,
                                   cur_bytes, &hd_qiov, 0);
-            qemu_co_mutex_lock(&s->lock);
             if (ret < 0) {
+                qemu_co_mutex_lock(&s->lock);
                 goto fail;
             }
         }
 
+        qemu_co_mutex_lock(&s->lock);
+
         while (l2meta != NULL) {
             QCowL2Meta *next;
 
diff --git a/block/trace-events b/block/trace-events
index 02dd80f..2045ac9 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -61,6 +61,7 @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
 qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
 qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
 qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
+qcow2_skip_cow(void *co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"
 
 # block/qcow2-cluster.c
 qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 6c7407f..e6ed8b2 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -143,27 +143,33 @@ $QEMU_IO -c "$OPEN_RO" -c "read -P 1 0 512" | _filter_qemu_io
 echo
 echo "=== Testing overlap while COW is in flight ==="
 echo
+BACKING_IMG=$TEST_IMG.base
+TEST_IMG=$BACKING_IMG _make_test_img 1G
+
+$QEMU_IO -c 'write 64k 64k' "$BACKING_IMG" | _filter_qemu_io
+
 # compat=0.10 is required in order to make the following discard actually
-# unallocate the sector rather than make it a zero sector - we want COW, after
-# all.
-IMGOPTS='compat=0.10' _make_test_img 1G
+# unallocate the sector rather than make it a zero sector as we would like
+# to reuse it for another guest offset
+IMGOPTS='compat=0.10' _make_test_img -b "$BACKING_IMG" 1G
 # Write two clusters, the second one enforces creation of an L2 table after
 # the first data cluster.
 $QEMU_IO -c 'write 0k 64k' -c 'write 512M 64k' "$TEST_IMG" | _filter_qemu_io
-# Discard the first cluster. This cluster will soon enough be reallocated and
-# used for COW.
+# Discard the first cluster. This cluster will soon enough be reallocated
 $QEMU_IO -c 'discard 0k 64k' "$TEST_IMG" | _filter_qemu_io
 # Now, corrupt the image by marking the second L2 table cluster as free.
 poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
-# Start a write operation requiring COW on the image stopping it right before
-# doing the read; then, trigger the corruption prevention by writing anything to
-# any unallocated cluster, leading to an attempt to overwrite the second L2
+# Start a write operation requiring COW on the image;
+# this write will reuse the host offset released by a previous discard.
+# Stop it right before doing the read.
+# Then, trigger the corruption prevention by writing anything to
+# another unallocated cluster, leading to an attempt to overwrite the second L2
 # table. Finally, resume the COW write and see it fail (but not crash).
 echo "open -o file.driver=blkdebug $TEST_IMG
 break cow_read 0
-aio_write 0k 1k
+aio_write 64k 1k
 wait_break 0
-write 64k 64k
+write 128k 64k
 resume 0" | $QEMU_IO | _filter_qemu_io
 
 echo
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index 25d5c39..00a6702 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -97,7 +97,10 @@ read 512/512 bytes at offset 0
 
 === Testing overlap while COW is in flight ===
 
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1073741824
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 backing_file=TEST_DIR/t.IMGFMT.base
 wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 65536/65536 bytes at offset 536870912
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v8 9/9] iotest 134: test cluster-misaligned encrypted write
  2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
                   ` (7 preceding siblings ...)
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2018-03-12 10:16 ` Anton Nefedov
  8 siblings, 0 replies; 19+ messages in thread
From: Anton Nefedov @ 2018-03-12 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov

COW (even empty/zero) areas require encryption too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 tests/qemu-iotests/134     |  9 +++++++++
 tests/qemu-iotests/134.out | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 9914415..6083ae4 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -59,6 +59,15 @@ echo "== reading whole image =="
 $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
 echo
+echo "== rewriting cluster part =="
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== verify pattern =="
+$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
 echo "== rewriting whole image =="
 $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index 972be49..09d46f6 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -5,6 +5,16 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
 read 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+== rewriting cluster part ==
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 == rewriting whole image ==
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 1/9] mirror: inherit supported write/zero flags
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 1/9] mirror: inherit supported write/zero flags Anton Nefedov
@ 2018-03-16 10:35   ` Alberto Garcia
  0 siblings, 0 replies; 19+ messages in thread
From: Alberto Garcia @ 2018-03-16 10:35 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Mon 12 Mar 2018 11:16:50 AM CET, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
>  block/mirror.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/block/mirror.c b/block/mirror.c
> index f5bf620..2fb786f 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1100,6 +1100,15 @@ static BlockDriver bdrv_mirror_top = {
>      .bdrv_child_perm            = bdrv_mirror_top_child_perm,
>  };
>  
> +static void mirror_top_set_supported_flags(BlockDriverState *bs)
> +{
> +    bs->supported_write_flags = BDRV_REQ_FUA &
> +        bs->backing->bs->supported_write_flags;
> +    bs->supported_zero_flags =
> +        (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
> +        bs->backing->bs->supported_zero_flags;
> +}
> +

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 3/9] quorum: set supported write/zero flags
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 3/9] quorum: " Anton Nefedov
@ 2018-03-16 10:39   ` Alberto Garcia
  0 siblings, 0 replies; 19+ messages in thread
From: Alberto Garcia @ 2018-03-16 10:39 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Mon 12 Mar 2018 11:16:52 AM CET, Anton Nefedov wrote:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 4/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2018-03-16 11:01   ` Alberto Garcia
  2018-03-16 14:52     ` Anton Nefedov
  0 siblings, 1 reply; 19+ messages in thread
From: Alberto Garcia @ 2018-03-16 11:01 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Mon 12 Mar 2018 11:16:53 AM CET, Anton Nefedov wrote:
> The flag is supposed to indicate that the region of the disk image has
> to be sufficiently allocated so it reads as zeroes.
>
> The call with the flag set must return -ENOTSUP if allocation cannot
> be done efficiently.
> This has to be made sure of by both
>   - the drivers that support the flag
>   - and the common block layer (so it will not fall back to any slowpath
>     (like writing zero buffers) in case the driver does not support
>     the flag).
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>

Hmm... this is not the patch that I reviewed, please remove the R-b
lines when you submit a modified patch.

> +    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
> +     * efficiently allocate the space so it reads as zeroes, or return an error.
> +     * Must be used together with BDRV_REQ_ZERO_WRITE.

I guess it's understandable from the context but it's not immediately
obvious that you can set BDRV_REQ_ZERO_WRITE but not this one.

Perhaps it could be rephrased as something like "If this is set then
BDRV_REQ_ZERO_WRITE must also be set".

> +     * Contradictory to BDRV_REQ_MAY_UNMAP so the two must not be used together.

Or simply "This flag cannot be set together with BDRV_REQ_MAY_UNMAP".

> +     */
> +    BDRV_REQ_ALLOCATE           = 0x40,

Berto

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 7/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2018-03-16 11:03   ` Alberto Garcia
  0 siblings, 0 replies; 19+ messages in thread
From: Alberto Garcia @ 2018-03-16 11:03 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Mon 12 Mar 2018 11:16:56 AM CET, Anton Nefedov wrote:
> Support the flag if the underlying BDS supports it
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>

This is again not the patch that I reviewed (the quorum bits were not
there), but I just check the changes and they look fine so you can keep
my Reviewed-by line.

Berto

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2018-03-16 13:58   ` Alberto Garcia
  2018-03-16 15:09     ` Anton Nefedov
  0 siblings, 1 reply; 19+ messages in thread
From: Alberto Garcia @ 2018-03-16 13:58 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Mon 12 Mar 2018 11:16:57 AM CET, Anton Nefedov wrote:
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2584,6 +2584,8 @@
>  #
>  # @cor_write: a write due to copy-on-read (since 2.11)
>  #
> +# @cluster_alloc_space: an allocation of a cluster file space (since 2.12)

That doesn't sound like correct English to me.

> +static bool is_unallocated(BlockDriverState *bs, int64_t offset, int64_t bytes)
> +{
> +    int64_t nr;
> +    return !bytes ||
> +        (!bdrv_is_allocated_above(bs, NULL, offset, bytes, &nr) && nr == bytes);
> +}
> +

Is this really more efficient than the previous is_zero() call ?

It seems that in both cases the code ends up calling
bdrv_common_block_status_above().

> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
> +{
> +    /* content with false negatives, giving is_allocated() is faster than
> +     * a proper zero detection with possible actual image seeks, which is
> +     * performed by is_zero() */

It took me a bit to understand this sentence, maybe some native English
speaker can suggest an alternate wording?

  "is_allocated() is not as accurate as is_zero() and can give us some
   false negatives, but it is much more efficient so let's use it here"

> +    return is_unallocated(bs, m->offset + m->cow_start.offset,
> +                          m->cow_start.nb_bytes) &&
> +           is_unallocated(bs, m->offset + m->cow_end.offset,
> +                          m->cow_end.nb_bytes);
> +}

Berto

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 4/9] block: introduce BDRV_REQ_ALLOCATE flag
  2018-03-16 11:01   ` Alberto Garcia
@ 2018-03-16 14:52     ` Anton Nefedov
  0 siblings, 0 replies; 19+ messages in thread
From: Anton Nefedov @ 2018-03-16 14:52 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den



On 16/3/2018 2:01 PM, Alberto Garcia wrote:
> On Mon 12 Mar 2018 11:16:53 AM CET, Anton Nefedov wrote:
>> The flag is supposed to indicate that the region of the disk image has
>> to be sufficiently allocated so it reads as zeroes.
>>
>> The call with the flag set must return -ENOTSUP if allocation cannot
>> be done efficiently.
>> This has to be made sure of by both
>>    - the drivers that support the flag
>>    - and the common block layer (so it will not fall back to any slowpath
>>      (like writing zero buffers) in case the driver does not support
>>      the flag).
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> Reviewed-by: Alberto Garcia <berto@igalia.com>
> 
> Hmm... this is not the patch that I reviewed, please remove the R-b
> lines when you submit a modified patch.
> 

Oops. sorry

>> +    /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
>> +     * efficiently allocate the space so it reads as zeroes, or return an error.
>> +     * Must be used together with BDRV_REQ_ZERO_WRITE.
> 
> I guess it's understandable from the context but it's not immediately
> obvious that you can set BDRV_REQ_ZERO_WRITE but not this one.
> 
> Perhaps it could be rephrased as something like "If this is set then
> BDRV_REQ_ZERO_WRITE must also be set".
> 

Sound much better, fixed

>> +     * Contradictory to BDRV_REQ_MAY_UNMAP so the two must not be used together.
> 
> Or simply "This flag cannot be set together with BDRV_REQ_MAY_UNMAP".
> 

Done

>> +     */
>> +    BDRV_REQ_ALLOCATE           = 0x40,
> 
> Berto
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 6/9] file-posix: support BDRV_REQ_ALLOCATE
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 6/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2018-03-16 14:53   ` Anton Nefedov
  0 siblings, 0 replies; 19+ messages in thread
From: Anton Nefedov @ 2018-03-16 14:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto



On 12/3/2018 1:16 PM, Anton Nefedov wrote:
> Current write_zeroes implementation is good enough to satisfy this flag too
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>

strictly speaking this patch differs from v8 as well
(rebase required an extra change, see below)
so perhaps I should have stripped the review tags too

> ---
>   block/file-posix.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 7f2cc63..2136df9 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -560,7 +560,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>       }
>       if (S_ISREG(st.st_mode)) {
>           s->discard_zeroes = true;
> +#ifdef CONFIG_FALLOCATE
>           s->has_fallocate = true;
> +        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
> +#endif
>       }
>       if (S_ISBLK(st.st_mode)) {
>   #ifdef BLKDISCARDZEROES
> @@ -595,10 +598,11 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>   #ifdef CONFIG_XFS
>       if (platform_test_xfs_fd(s->fd)) {
>           s->is_xfs = true;
> +        bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
>       }
>   #endif
>   
> -    bs->supported_zero_flags = s->discard_zeroes ? BDRV_REQ_MAY_UNMAP : 0;
> +    bs->supported_zero_flags |= s->discard_zeroes ? BDRV_REQ_MAY_UNMAP : 0;
>       ret = 0;

(this change is new in v9)

>   fail:
>       if (filename && (bdrv_flags & BDRV_O_TEMPORARY)) {
> @@ -1416,6 +1420,10 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
>           }
>           s->has_fallocate = false;
>       }
> +
> +    if (!s->has_fallocate) {
> +        aiocb->bs->supported_zero_flags &= ~BDRV_REQ_ALLOCATE;
> +    }
>   #endif
>   
>       return -ENOTSUP;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 8/9] qcow2: skip writing zero buffers to empty COW areas
  2018-03-16 13:58   ` Alberto Garcia
@ 2018-03-16 15:09     ` Anton Nefedov
  0 siblings, 0 replies; 19+ messages in thread
From: Anton Nefedov @ 2018-03-16 15:09 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den



On 16/3/2018 4:58 PM, Alberto Garcia wrote:
> On Mon 12 Mar 2018 11:16:57 AM CET, Anton Nefedov wrote:
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2584,6 +2584,8 @@
>>   #
>>   # @cor_write: a write due to copy-on-read (since 2.11)
>>   #
>> +# @cluster_alloc_space: an allocation of a cluster file space (since 2.12)

now it's 2.13 I believe

> 
> That doesn't sound like correct English to me.
> 

how about
   "an allocation of file space for a cluster"

>> +static bool is_unallocated(BlockDriverState *bs, int64_t offset, int64_t bytes)
>> +{
>> +    int64_t nr;
>> +    return !bytes ||
>> +        (!bdrv_is_allocated_above(bs, NULL, offset, bytes, &nr) && nr == bytes);
>> +}
>> +
> 
> Is this really more efficient than the previous is_zero() call ?
> 
> It seems that in both cases the code ends up calling
> bdrv_common_block_status_above().
> 

the difference is that is_allocated passes 'want_zero = 0'.
It hints drivers to skip thorough seeks for zeroes (e.g. file-posix
returns (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) right away).
Moreover, bdrv_co_block_status() even skips the check in local_file
unless the protocol driver returned BDRV_BLOCK_RAW.

>> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
>> +{
>> +    /* content with false negatives, giving is_allocated() is faster than
>> +     * a proper zero detection with possible actual image seeks, which is
>> +     * performed by is_zero() */
> 
> It took me a bit to understand this sentence, maybe some native English
> speaker can suggest an alternate wording?
> 
>    "is_allocated() is not as accurate as is_zero() and can give us some
>     false negatives, but it is much more efficient so let's use it here"
> 

wording is the hardest part of it all :)
maybe:

   /* This check is designed for optimization shortcut so it must be
    * efficient.
    * Instead of is_zero(), use is_unallocated() as it is faster (but not
    * as accurate and can result in false negatives). */

>> +    return is_unallocated(bs, m->offset + m->cow_start.offset,
>> +                          m->cow_start.nb_bytes) &&
>> +           is_unallocated(bs, m->offset + m->cow_end.offset,
>> +                          m->cow_end.nb_bytes);
>> +}
> 
> Berto
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH v8 5/9] block: treat BDRV_REQ_ALLOCATE as serialising
  2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 5/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2018-03-19 11:02   ` Alberto Garcia
  0 siblings, 0 replies; 19+ messages in thread
From: Alberto Garcia @ 2018-03-19 11:02 UTC (permalink / raw)
  To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den

On Mon 12 Mar 2018 11:16:54 AM CET, Anton Nefedov wrote:
> The idea is that ALLOCATE requests may overlap with other requests.
> Reuse the existing block layer infrastructure for serialising requests.
> Use the following approach:
>   - mark ALLOCATE serialising, so subsequent requests to the area wait
>   - ALLOCATE request itself must never wait if another request is in flight
>     already. Return EAGAIN, let the caller reconsider.
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

> @@ -1498,8 +1507,13 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
>      max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
>                                     align);
>  
> -    waited = wait_serialising_requests(req);
> -    assert(!waited || !req->serialising);
> +    found = find_or_wait_serialising_requests(req,
> +                                              !(flags & BDRV_REQ_ALLOCATE));
> +    if (found && (flags & BDRV_REQ_ALLOCATE)) {
> +        return -EAGAIN;
> +    }
> +

Another alternative (perhaps a bit more readable):

    if (flags & BDRV_REQ_ALLOCATE) {
        if (find_or_wait_serialising_requests(req, false)) {
            return -EAGAIN;
        }
    } else {
        bool found = wait_serialising_requests(req);
        assert(!found || !req->serialising);
    }

but yours is fine, so keep it if you prefer.

Berto

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-03-19 11:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12 10:16 [Qemu-devel] [PATCH v8 0/9] qcow2: cluster space preallocation Anton Nefedov
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 1/9] mirror: inherit supported write/zero flags Anton Nefedov
2018-03-16 10:35   ` Alberto Garcia
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 2/9] blkverify: set " Anton Nefedov
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 3/9] quorum: " Anton Nefedov
2018-03-16 10:39   ` Alberto Garcia
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 4/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
2018-03-16 11:01   ` Alberto Garcia
2018-03-16 14:52     ` Anton Nefedov
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 5/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
2018-03-19 11:02   ` Alberto Garcia
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 6/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
2018-03-16 14:53   ` Anton Nefedov
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 7/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
2018-03-16 11:03   ` Alberto Garcia
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
2018-03-16 13:58   ` Alberto Garcia
2018-03-16 15:09     ` Anton Nefedov
2018-03-12 10:16 ` [Qemu-devel] [PATCH v8 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.