qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/4] qcow2: async handling of fragmented io
@ 2019-07-30 14:18 Vladimir Sementsov-Ogievskiy
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-07-30 14:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, vsementsov, armbru, mreitz, stefanha, den

Hi all!

Here is an asynchronous scheme for handling fragmented qcow2
reads and writes. Both qcow2 read and write functions loops through
sequential portions of data. The series aim it to parallelize these
loops iterations.
It improves performance for fragmented qcow2 images, I've tested it
as described below.

v2: changed a lot, as
 1. a lot of preparations around locks, hd_qiovs, threads for encryption
    are done
 2. I decided to create separate file with async request handling API, to
    reuse it for backup, stream and copy-on-read to improve their performance
    too. Mirror and qemu-img convert has their own async request handling,
    may be we'll be able finally merge all these similar code into one
    feature.
    Note that not all API calls used in qcow2, some will be needed on
    following steps for parallelizing other io loops.

Based-on: https://github.com/stefanha/qemu/commits/block

About testing:

I have four 4G qcow2 images (with default 64k block size) on my ssd disk:
t-seq.qcow2 - sequentially written qcow2 image
t-reverse.qcow2 - filled by writing 64k portions from end to the start
t-rand.qcow2 - filled by writing 64k portions (aligned) in random order
t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters
(see source code of image generation in the end for details)

and I've done several runs like the following (sequential io by 1mb chunks):

    out=/tmp/block; echo > $out; cat /tmp/files | while read file; do for wr in {"","-w"}; do echo "$file" $wr; ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none $wr "$file" | grep 'Run completed in' | awk '{print $4}' >> $out; done; done


short info about parameters:
  -w - do writes (otherwise do reads)
  -c - count of blocks
  -s - block size
  -t none - disable cache
  -n - native aio
  -d 1 - don't use parallel requests provided by qemu-img bench itself

results:

    +---------------------------+---------+---------+
    | file                      | master  | async   |
    +---------------------------+---------+---------+
    | /ssd/t-part-rand.qcow2    | 14.671  | 9.193   |
    +---------------------------+---------+---------+
    | /ssd/t-part-rand.qcow2 -w | 11.434  | 8.621   |
    +---------------------------+---------+---------+
    | /ssd/t-rand.qcow2         | 20.421  | 10.05   |
    +---------------------------+---------+---------+
    | /ssd/t-rand.qcow2 -w      | 11.097  | 8.915   |
    +---------------------------+---------+---------+
    | /ssd/t-reverse.qcow2      | 17.515  | 9.407   |
    +---------------------------+---------+---------+
    | /ssd/t-reverse.qcow2 -w   | 11.255  | 8.649   |
    +---------------------------+---------+---------+
    | /ssd/t-seq.qcow2          | 9.081   | 9.072   |
    +---------------------------+---------+---------+
    | /ssd/t-seq.qcow2 -w       | 8.761   | 8.747   |
    +---------------------------+---------+---------+
    | /tmp/t-part-rand.qcow2    | 41.179  | 41.37   |
    +---------------------------+---------+---------+
    | /tmp/t-part-rand.qcow2 -w | 54.097  | 55.323  |
    +---------------------------+---------+---------+
    | /tmp/t-rand.qcow2         | 711.899 | 514.339 |
    +---------------------------+---------+---------+
    | /tmp/t-rand.qcow2 -w      | 546.259 | 642.114 |
    +---------------------------+---------+---------+
    | /tmp/t-reverse.qcow2      | 86.065  | 96.522  |
    +---------------------------+---------+---------+
    | /tmp/t-reverse.qcow2 -w   | 46.557  | 48.499  |
    +---------------------------+---------+---------+
    | /tmp/t-seq.qcow2          | 33.804  | 33.862  |
    +---------------------------+---------+---------+
    | /tmp/t-seq.qcow2 -w       | 34.299  | 34.233  |
    +---------------------------+---------+---------+


Performance gain is obvious, especially for read and especially for ssd.
For hdd there is a degradation for reverse case, but this is the most
impossible case and seems not critical.

How images are generated:

    ==== gen-writes ======
    #!/usr/bin/env python
    import random
    import sys

    size = 4 * 1024 * 1024 * 1024
    block = 64 * 1024
    block2 = 1024 * 1024

    arg = sys.argv[1]

    if arg in ('rand', 'reverse', 'seq'):
        writes = list(range(0, size, block))

    if arg == 'rand':
        random.shuffle(writes)
    elif arg == 'reverse':
        writes.reverse()
    elif arg == 'part-rand':
        writes = []
        for off in range(0, size, block2):
            wr = list(range(off, off + block2, block))
            random.shuffle(wr)
            writes.extend(wr)
    elif arg != 'seq':
        sys.exit(1)

    for w in writes:
        print 'write -P 0xff {} {}'.format(w, block)

    print 'q'
    ==========================

    ===== gen-test-images.sh =====
    #!/bin/bash

    IMG_PATH=/ssd

    for name in seq reverse rand part-rand; do
        IMG=$IMG_PATH/t-$name.qcow2
        echo createing $IMG ...
        rm -f $IMG
        qemu-img create -f qcow2 $IMG 4G
        gen-writes $name | qemu-io $IMG
    done
    ==============================


Vladimir Sementsov-Ogievskiy (4):
  block: introduce aio task pool
  block/qcow2: refactor qcow2_co_preadv_part
  block/qcow2: refactor qcow2_co_pwritev_part
  block/qcow2: introduce parallel subrequest handling in read and write

 qapi/block-core.json |   2 +-
 block/aio_task.h     |  52 +++++
 block/aio_task.c     | 119 +++++++++++
 block/qcow2.c        | 459 ++++++++++++++++++++++++++++---------------
 block/Makefile.objs  |   2 +
 block/trace-events   |   1 +
 6 files changed, 477 insertions(+), 158 deletions(-)
 create mode 100644 block/aio_task.h
 create mode 100644 block/aio_task.c

-- 
2.18.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool
  2019-07-30 14:18 [Qemu-devel] [PATCH v2 0/4] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
@ 2019-07-30 14:18 ` Vladimir Sementsov-Ogievskiy
  2019-08-13 20:47   ` Max Reitz
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-07-30 14:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, vsementsov, armbru, mreitz, stefanha, den

Common interface for aio task loops. To be used for improving
performance of synchronous io loops in qcow2, block-stream,
copy-on-read, and may be other places.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/aio_task.h    |  52 +++++++++++++++++++
 block/aio_task.c    | 119 ++++++++++++++++++++++++++++++++++++++++++++
 block/Makefile.objs |   2 +
 3 files changed, 173 insertions(+)
 create mode 100644 block/aio_task.h
 create mode 100644 block/aio_task.c

diff --git a/block/aio_task.h b/block/aio_task.h
new file mode 100644
index 0000000000..933af1d8e7
--- /dev/null
+++ b/block/aio_task.h
@@ -0,0 +1,52 @@
+/*
+ * Aio tasks loops
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_AIO_TASK_H
+#define BLOCK_AIO_TASK_H
+
+#include "qemu/coroutine.h"
+
+typedef struct AioTaskPool AioTaskPool;
+typedef struct AioTask AioTask;
+typedef int (*AioTaskFunc)(AioTask *task);
+struct AioTask {
+    AioTaskPool *pool;
+    AioTaskFunc func;
+    int ret;
+};
+
+/*
+ * aio_task_pool_new
+ *
+ * The caller is responsible to g_free AioTaskPool pointer after use.
+ */
+AioTaskPool *aio_task_pool_new(int max_busy_tasks);
+int aio_task_pool_status(AioTaskPool *pool);
+bool aio_task_pool_empty(AioTaskPool *pool);
+void aio_task_pool_start_task(AioTaskPool *pool, AioTask *task);
+void aio_task_pool_wait_slot(AioTaskPool *pool);
+void aio_task_pool_wait_one(AioTaskPool *pool);
+void aio_task_pool_wait_all(AioTaskPool *pool);
+
+#endif /* BLOCK_AIO_TASK_H */
diff --git a/block/aio_task.c b/block/aio_task.c
new file mode 100644
index 0000000000..807be8deb5
--- /dev/null
+++ b/block/aio_task.c
@@ -0,0 +1,119 @@
+/*
+ * Aio tasks loops
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "aio_task.h"
+
+struct AioTaskPool {
+    Coroutine *main_co;
+    int status;
+    int max_busy_tasks;
+    int busy_tasks;
+    bool wait_done;
+};
+
+static void aio_task_co(void *opaque)
+{
+    AioTask *task = opaque;
+    AioTaskPool *pool = task->pool;
+
+    assert(pool->busy_tasks < pool->max_busy_tasks);
+    pool->busy_tasks++;
+
+    task->ret = task->func(task);
+
+    pool->busy_tasks--;
+
+    if (task->ret < 0 && pool->status == 0) {
+        pool->status = task->ret;
+    }
+
+    g_free(task);
+
+    if (pool->wait_done) {
+        pool->wait_done = false;
+        aio_co_wake(pool->main_co);
+    }
+}
+
+void aio_task_pool_wait_one(AioTaskPool *pool)
+{
+    assert(pool->busy_tasks > 0);
+    assert(qemu_coroutine_self() == pool->main_co);
+
+    pool->wait_done = true;
+    qemu_coroutine_yield();
+
+    assert(!pool->wait_done);
+    assert(pool->busy_tasks < pool->max_busy_tasks);
+}
+
+void aio_task_pool_wait_slot(AioTaskPool *pool)
+{
+    if (pool->busy_tasks < pool->max_busy_tasks) {
+        return;
+    }
+
+    aio_task_pool_wait_one(pool);
+}
+
+void aio_task_pool_wait_all(AioTaskPool *pool)
+{
+    while (pool->busy_tasks > 0) {
+        aio_task_pool_wait_one(pool);
+    }
+}
+
+void aio_task_pool_start_task(AioTaskPool *pool, AioTask *task)
+{
+    aio_task_pool_wait_slot(pool);
+
+    task->pool = pool;
+    qemu_coroutine_enter(qemu_coroutine_create(aio_task_co, task));
+}
+
+AioTaskPool *aio_task_pool_new(int max_busy_tasks)
+{
+    AioTaskPool *pool = g_new0(AioTaskPool, 1);
+
+    pool->main_co = qemu_coroutine_self();
+    pool->max_busy_tasks = max_busy_tasks;
+
+    return pool;
+}
+
+int aio_task_pool_status(AioTaskPool *pool)
+{
+    if (!pool) {
+        return 0; /* Sugar for lazy allocation of aio pool */
+    }
+
+    return pool->status;
+}
+
+bool aio_task_pool_empty(AioTaskPool *pool)
+{
+    return pool->busy_tasks == 0;
+}
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 35f3bca4d9..c2eb8c8769 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -40,6 +40,8 @@ block-obj-y += throttle.o copy-on-read.o
 
 block-obj-y += crypto.o
 
+block-obj-y += aio_task.o
+
 common-obj-y += stream.o
 
 nfs.o-libs         := $(LIBNFS_LIBS)
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part
  2019-07-30 14:18 [Qemu-devel] [PATCH v2 0/4] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
@ 2019-07-30 14:18 ` Vladimir Sementsov-Ogievskiy
  2019-08-13 21:31   ` Max Reitz
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 4/4] block/qcow2: introduce parallel subrequest handling in read and write Vladimir Sementsov-Ogievskiy
  3 siblings, 1 reply; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-07-30 14:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, vsementsov, armbru, mreitz, stefanha, den

Further patch will run partial requests of iterations of
qcow2_co_preadv in parallel for performance reasons. To prepare for
this, separate part which may be parallelized into separate function
(qcow2_co_preadv_task).

While being here, also separate encrypted clusters reading to own
function, like it is done for compressed reading.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json |   2 +-
 block/qcow2.c        | 206 +++++++++++++++++++++++--------------------
 2 files changed, 112 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0d43d4f37c..dd80aa11db 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3266,7 +3266,7 @@
             'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
             'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
             'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-            'cor_write', 'cluster_alloc_space', 'none'] }
+            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
 
 ##
 # @BlkdebugIOType:
diff --git a/block/qcow2.c b/block/qcow2.c
index 93ab7edcea..7fa71968b2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1967,17 +1967,115 @@ out:
     return ret;
 }
 
+static coroutine_fn int
+qcow2_co_preadv_encrypted(BlockDriverState *bs,
+                           uint64_t file_cluster_offset,
+                           uint64_t offset,
+                           uint64_t bytes,
+                           QEMUIOVector *qiov,
+                           uint64_t qiov_offset)
+{
+    int ret;
+    BDRVQcow2State *s = bs->opaque;
+    uint8_t *buf;
+
+    assert(bs->encrypted && s->crypto);
+    assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+
+    /*
+     * For encrypted images, read everything into a temporary
+     * contiguous buffer on which the AES functions can work.
+     * Note, that we can implement enctyption, working on qiov,
+     * but we must not do decryption in guest buffers for security
+     * reasons.
+     */
+
+    buf = qemu_try_blockalign(s->data_file->bs, bytes);
+    if (buf == NULL) {
+        return -ENOMEM;
+    }
+
+    BLKDBG_EVENT(bs->file, BLKDBG_READ_ENCRYPTED);
+    ret = bdrv_co_pread(s->data_file,
+                        file_cluster_offset + offset_into_cluster(s, offset),
+                        bytes, buf, 0);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+    assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+    if (qcow2_co_decrypt(bs, file_cluster_offset, offset, buf, bytes) < 0) {
+        ret = -EIO;
+        goto fail;
+    }
+    qemu_iovec_from_buf(qiov, qiov_offset, buf, bytes);
+
+fail:
+    qemu_vfree(buf);
+
+    return ret;
+}
+
+static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
+                                             QCow2ClusterType cluster_type,
+                                             uint64_t file_cluster_offset,
+                                             uint64_t offset, uint64_t bytes,
+                                             QEMUIOVector *qiov,
+                                             size_t qiov_offset)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int offset_in_cluster = offset_into_cluster(s, offset);
+
+    switch (cluster_type) {
+    case QCOW2_CLUSTER_UNALLOCATED:
+        assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
+
+        BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
+        return bdrv_co_preadv_part(bs->backing, offset, bytes,
+                                   qiov, qiov_offset, 0);
+
+    case QCOW2_CLUSTER_COMPRESSED:
+        return qcow2_co_preadv_compressed(bs, file_cluster_offset,
+                                          offset, bytes, qiov, qiov_offset);
+
+    case QCOW2_CLUSTER_NORMAL:
+        if ((file_cluster_offset & 511) != 0) {
+            return -EIO;
+        }
+
+        if (bs->encrypted) {
+            return qcow2_co_preadv_encrypted(bs, file_cluster_offset,
+                                             offset, bytes, qiov, qiov_offset);
+        }
+
+        BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+        return bdrv_co_preadv_part(s->data_file,
+                                   file_cluster_offset + offset_in_cluster,
+                                   bytes, qiov, qiov_offset, 0);
+
+    default:
+        g_assert_not_reached();
+        /*
+         * QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC handled
+         * in qcow2_co_preadv_part
+         */
+    }
+
+    g_assert_not_reached();
+
+    return -EIO;
+}
+
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
                                              size_t qiov_offset, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
-    int offset_in_cluster;
     int ret;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     uint64_t cluster_offset = 0;
-    uint8_t *cluster_data = NULL;
 
     while (bytes != 0) {
 
@@ -1992,111 +2090,29 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
-            goto fail;
+            return ret;
         }
 
-        offset_in_cluster = offset_into_cluster(s, offset);
-
-        switch (ret) {
-        case QCOW2_CLUSTER_UNALLOCATED:
-
-            if (bs->backing) {
-                BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
-                ret = bdrv_co_preadv_part(bs->backing, offset, cur_bytes,
-                                          qiov, qiov_offset, 0);
-                if (ret < 0) {
-                    goto fail;
-                }
-            } else {
-                /* Note: in this case, no need to wait */
-                qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
-            }
-            break;
-
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
+        if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
+            ret == QCOW2_CLUSTER_ZERO_ALLOC ||
+            (ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
+        {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
-            break;
-
-        case QCOW2_CLUSTER_COMPRESSED:
-            ret = qcow2_co_preadv_compressed(bs, cluster_offset,
-                                             offset, cur_bytes,
-                                             qiov, qiov_offset);
+        } else {
+            ret = qcow2_co_preadv_task(bs, ret,
+                                       cluster_offset, offset, cur_bytes,
+                                       qiov, qiov_offset);
             if (ret < 0) {
-                goto fail;
-            }
-
-            break;
-
-        case QCOW2_CLUSTER_NORMAL:
-            if ((cluster_offset & 511) != 0) {
-                ret = -EIO;
-                goto fail;
-            }
-
-            if (bs->encrypted) {
-                assert(s->crypto);
-
-                /*
-                 * For encrypted images, read everything into a temporary
-                 * contiguous buffer on which the AES functions can work.
-                 */
-                if (!cluster_data) {
-                    cluster_data =
-                        qemu_try_blockalign(s->data_file->bs,
-                                            QCOW_MAX_CRYPT_CLUSTERS
-                                            * s->cluster_size);
-                    if (cluster_data == NULL) {
-                        ret = -ENOMEM;
-                        goto fail;
-                    }
-                }
-
-                assert(cur_bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
-
-                BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-                ret = bdrv_co_pread(s->data_file,
-                                    cluster_offset + offset_in_cluster,
-                                    cur_bytes, cluster_data, 0);
-                if (ret < 0) {
-                    goto fail;
-                }
-
-                assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-                assert((cur_bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
-                if (qcow2_co_decrypt(bs, cluster_offset, offset,
-                                     cluster_data, cur_bytes) < 0) {
-                    ret = -EIO;
-                    goto fail;
-                }
-                qemu_iovec_from_buf(qiov, qiov_offset, cluster_data, cur_bytes);
-            } else {
-                BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-                ret = bdrv_co_preadv_part(s->data_file,
-                                          cluster_offset + offset_in_cluster,
-                                          cur_bytes, qiov, qiov_offset, 0);
-                if (ret < 0) {
-                    goto fail;
-                }
+                return ret;
             }
-            break;
-
-        default:
-            g_assert_not_reached();
-            ret = -EIO;
-            goto fail;
         }
 
         bytes -= cur_bytes;
         offset += cur_bytes;
         qiov_offset += cur_bytes;
     }
-    ret = 0;
-
-fail:
-    qemu_vfree(cluster_data);
 
-    return ret;
+    return 0;
 }
 
 /* Check if it's possible to merge a write request with the writing of
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part
  2019-07-30 14:18 [Qemu-devel] [PATCH v2 0/4] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
@ 2019-07-30 14:18 ` Vladimir Sementsov-Ogievskiy
  2019-08-14 15:55   ` Max Reitz
  2019-08-14 16:23   ` Max Reitz
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 4/4] block/qcow2: introduce parallel subrequest handling in read and write Vladimir Sementsov-Ogievskiy
  3 siblings, 2 replies; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-07-30 14:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, vsementsov, armbru, mreitz, stefanha, den

Similarly to previous commit, prepare for parallelizing write-loop
iterations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.c | 150 +++++++++++++++++++++++++++++---------------------
 1 file changed, 88 insertions(+), 62 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7fa71968b2..37766b8b7c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2235,6 +2235,87 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
     return 0;
 }
 
+/*
+ * qcow2_co_pwritev_task
+ * Called with s->lock unlocked
+ * l2meta  - if not NULL, qcow2_co_do_pwritev() will consume it. Caller must not
+ *           use it somehow after qcow2_co_pwritev_task() call
+ */
+static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
+                                              uint64_t file_cluster_offset,
+                                              uint64_t offset, uint64_t bytes,
+                                              QEMUIOVector *qiov,
+                                              uint64_t qiov_offset,
+                                              QCowL2Meta *l2meta)
+{
+    int ret;
+    BDRVQcow2State *s = bs->opaque;
+    void *crypt_buf = NULL;
+    int offset_in_cluster = offset_into_cluster(s, offset);
+    QEMUIOVector encrypted_qiov;
+
+    if (bs->encrypted) {
+        assert(s->crypto);
+        assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+        crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);
+        if (crypt_buf == NULL) {
+            ret = -ENOMEM;
+            goto out_unlocked;
+        }
+        qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes);
+
+        if (qcow2_co_encrypt(bs, file_cluster_offset, offset,
+                             crypt_buf, bytes) < 0) {
+            ret = -EIO;
+            goto out_unlocked;
+        }
+
+        qemu_iovec_init_buf(&encrypted_qiov, crypt_buf, bytes);
+        qiov = &encrypted_qiov;
+        qiov_offset = 0;
+    }
+
+    /* Try to efficiently initialize the physical space with zeroes */
+    ret = handle_alloc_space(bs, l2meta);
+    if (ret < 0) {
+        goto out_unlocked;
+    }
+
+    /*
+     * If we need to do COW, check if it's possible to merge the
+     * writing of the guest data together with that of the COW regions.
+     * If it's not possible (or not necessary) then write the
+     * guest data now.
+     */
+    if (!merge_cow(offset, bytes, qiov, qiov_offset, l2meta)) {
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+        trace_qcow2_writev_data(qemu_coroutine_self(),
+                                file_cluster_offset + offset_in_cluster);
+        ret = bdrv_co_pwritev_part(s->data_file,
+                                   file_cluster_offset + offset_in_cluster,
+                                   bytes, qiov, qiov_offset, 0);
+        if (ret < 0) {
+            goto out_unlocked;
+        }
+    }
+
+    qemu_co_mutex_lock(&s->lock);
+
+    ret = qcow2_handle_l2meta(bs, &l2meta, true);
+    goto out_locked;
+
+out_unlocked:
+    qemu_co_mutex_lock(&s->lock);
+
+out_locked:
+    qcow2_handle_l2meta(bs, &l2meta, false);
+    qemu_co_mutex_unlock(&s->lock);
+
+    qemu_vfree(crypt_buf);
+
+    return ret;
+}
+
 static coroutine_fn int qcow2_co_pwritev_part(
         BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         QEMUIOVector *qiov, size_t qiov_offset, int flags)
@@ -2244,15 +2325,11 @@ static coroutine_fn int qcow2_co_pwritev_part(
     int ret;
     unsigned int cur_bytes; /* number of sectors in current iteration */
     uint64_t cluster_offset;
-    QEMUIOVector encrypted_qiov;
     uint64_t bytes_done = 0;
-    uint8_t *cluster_data = NULL;
     QCowL2Meta *l2meta = NULL;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
 
-    qemu_co_mutex_lock(&s->lock);
-
     while (bytes != 0) {
 
         l2meta = NULL;
@@ -2266,6 +2343,8 @@ static coroutine_fn int qcow2_co_pwritev_part(
                             - offset_in_cluster);
         }
 
+        qemu_co_mutex_lock(&s->lock);
+
         ret = qcow2_alloc_cluster_offset(bs, offset, &cur_bytes,
                                          &cluster_offset, &l2meta);
         if (ret < 0) {
@@ -2283,62 +2362,11 @@ static coroutine_fn int qcow2_co_pwritev_part(
 
         qemu_co_mutex_unlock(&s->lock);
 
-        if (bs->encrypted) {
-            assert(s->crypto);
-            if (!cluster_data) {
-                cluster_data = qemu_try_blockalign(bs->file->bs,
-                                                   QCOW_MAX_CRYPT_CLUSTERS
-                                                   * s->cluster_size);
-                if (cluster_data == NULL) {
-                    ret = -ENOMEM;
-                    goto out_unlocked;
-                }
-            }
-
-            assert(cur_bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
-            qemu_iovec_to_buf(qiov, qiov_offset + bytes_done,
-                              cluster_data, cur_bytes);
-
-            if (qcow2_co_encrypt(bs, cluster_offset, offset,
-                                 cluster_data, cur_bytes) < 0) {
-                ret = -EIO;
-                goto out_unlocked;
-            }
-
-            qemu_iovec_init_buf(&encrypted_qiov, cluster_data, cur_bytes);
-        }
-
-        /* Try to efficiently initialize the physical space with zeroes */
-        ret = handle_alloc_space(bs, l2meta);
+        ret = qcow2_co_pwritev_task(bs, cluster_offset, offset, cur_bytes,
+                                    qiov, bytes_done, l2meta);
+        l2meta = NULL; /* l2meta is consumed by qcow2_co_do_pwritev() */
         if (ret < 0) {
-            goto out_unlocked;
-        }
-
-        /* If we need to do COW, check if it's possible to merge the
-         * writing of the guest data together with that of the COW regions.
-         * If it's not possible (or not necessary) then write the
-         * guest data now. */
-        if (!merge_cow(offset, cur_bytes,
-                       bs->encrypted ? &encrypted_qiov : qiov,
-                       bs->encrypted ? 0 : qiov_offset + bytes_done, l2meta))
-        {
-            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-            trace_qcow2_writev_data(qemu_coroutine_self(),
-                                    cluster_offset + offset_in_cluster);
-            ret = bdrv_co_pwritev_part(
-                    s->data_file, cluster_offset + offset_in_cluster, cur_bytes,
-                    bs->encrypted ? &encrypted_qiov : qiov,
-                    bs->encrypted ? 0 : qiov_offset + bytes_done, 0);
-            if (ret < 0) {
-                goto out_unlocked;
-            }
-        }
-
-        qemu_co_mutex_lock(&s->lock);
-
-        ret = qcow2_handle_l2meta(bs, &l2meta, true);
-        if (ret) {
-            goto out_locked;
+            goto fail_nometa;
         }
 
         bytes -= cur_bytes;
@@ -2347,9 +2375,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
         trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
     }
     ret = 0;
-    goto out_locked;
 
-out_unlocked:
     qemu_co_mutex_lock(&s->lock);
 
 out_locked:
@@ -2357,7 +2383,7 @@ out_locked:
 
     qemu_co_mutex_unlock(&s->lock);
 
-    qemu_vfree(cluster_data);
+fail_nometa:
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
 
     return ret;
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v2 4/4] block/qcow2: introduce parallel subrequest handling in read and write
  2019-07-30 14:18 [Qemu-devel] [PATCH v2 0/4] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
@ 2019-07-30 14:18 ` Vladimir Sementsov-Ogievskiy
  2019-08-14 16:24   ` Max Reitz
  3 siblings, 1 reply; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-07-30 14:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, vsementsov, armbru, mreitz, stefanha, den

It improves performance for fragmented qcow2 images.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.c      | 125 +++++++++++++++++++++++++++++++++++++++++----
 block/trace-events |   1 +
 2 files changed, 115 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 37766b8b7c..5f0e66ea48 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -40,6 +40,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "crypto.h"
+#include "aio_task.h"
 
 /*
   Differences with QCOW:
@@ -2017,6 +2018,62 @@ fail:
     return ret;
 }
 
+typedef struct Qcow2AioTask {
+    AioTask task;
+
+    BlockDriverState *bs;
+    QCow2ClusterType cluster_type; /* only for read */
+    uint64_t file_cluster_offset;
+    uint64_t offset;
+    uint64_t bytes;
+    QEMUIOVector *qiov;
+    uint64_t qiov_offset;
+    QCowL2Meta *l2meta; /* only for write */
+} Qcow2AioTask;
+
+#define QCOW2_MAX_WORKERS 8
+
+static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task);
+static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
+                                       AioTaskPool *pool,
+                                       AioTaskFunc func,
+                                       QCow2ClusterType cluster_type,
+                                       uint64_t file_cluster_offset,
+                                       uint64_t offset,
+                                       uint64_t bytes,
+                                       QEMUIOVector *qiov,
+                                       size_t qiov_offset,
+                                       QCowL2Meta *l2meta)
+{
+    Qcow2AioTask local_task;
+    Qcow2AioTask *task = pool ? g_new(Qcow2AioTask, 1) : &local_task;
+
+    *task = (Qcow2AioTask) {
+        .task.func = func,
+        .bs = bs,
+        .cluster_type = cluster_type,
+        .qiov = qiov,
+        .file_cluster_offset = file_cluster_offset,
+        .offset = offset,
+        .bytes = bytes,
+        .qiov_offset = qiov_offset,
+        .l2meta = l2meta,
+    };
+
+    trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
+                         func == qcow2_co_preadv_task_entry ? "read" : "write",
+                         cluster_type, file_cluster_offset, offset, bytes,
+                         qiov, qiov_offset);
+
+    if (!pool) {
+        return func(&task->task);
+    }
+
+    aio_task_pool_start_task(pool, &task->task);
+
+    return 0;
+}
+
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
                                              QCow2ClusterType cluster_type,
                                              uint64_t file_cluster_offset,
@@ -2067,6 +2124,16 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
     return -EIO;
 }
 
+static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
+{
+    Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
+
+    assert(!t->l2meta);
+
+    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->file_cluster_offset,
+                                t->offset, t->bytes, t->qiov, t->qiov_offset);
+}
+
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
@@ -2076,9 +2143,9 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
     int ret;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     uint64_t cluster_offset = 0;
+    AioTaskPool *aio = NULL;
 
-    while (bytes != 0) {
-
+    while (bytes != 0 && aio_task_pool_status(aio) == 0) {
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
         if (s->crypto) {
@@ -2090,7 +2157,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
-            return ret;
+            goto out;
         }
 
         if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
@@ -2099,11 +2166,14 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
-            ret = qcow2_co_preadv_task(bs, ret,
-                                       cluster_offset, offset, cur_bytes,
-                                       qiov, qiov_offset);
+            if (!aio && cur_bytes != bytes) {
+                aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
+            }
+            ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, ret,
+                                 cluster_offset, offset, cur_bytes,
+                                 qiov, qiov_offset, NULL);
             if (ret < 0) {
-                return ret;
+                goto out;
             }
         }
 
@@ -2112,7 +2182,16 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         qiov_offset += cur_bytes;
     }
 
-    return 0;
+out:
+    if (aio) {
+        aio_task_pool_wait_all(aio);
+        if (ret == 0) {
+            ret = aio_task_pool_status(aio);
+        }
+        g_free(aio);
+    }
+
+    return ret;
 }
 
 /* Check if it's possible to merge a write request with the writing of
@@ -2316,6 +2395,17 @@ out_locked:
     return ret;
 }
 
+static coroutine_fn int qcow2_co_pwritev_task_entry(AioTask *task)
+{
+    Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
+
+    assert(!t->cluster_type);
+
+    return qcow2_co_pwritev_task(t->bs, t->file_cluster_offset,
+                                 t->offset, t->bytes, t->qiov, t->qiov_offset,
+                                 t->l2meta);
+}
+
 static coroutine_fn int qcow2_co_pwritev_part(
         BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         QEMUIOVector *qiov, size_t qiov_offset, int flags)
@@ -2327,10 +2417,11 @@ static coroutine_fn int qcow2_co_pwritev_part(
     uint64_t cluster_offset;
     uint64_t bytes_done = 0;
     QCowL2Meta *l2meta = NULL;
+    AioTaskPool *aio = NULL;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
 
-    while (bytes != 0) {
+    while (bytes != 0 && aio_task_pool_status(aio) == 0) {
 
         l2meta = NULL;
 
@@ -2362,8 +2453,12 @@ static coroutine_fn int qcow2_co_pwritev_part(
 
         qemu_co_mutex_unlock(&s->lock);
 
-        ret = qcow2_co_pwritev_task(bs, cluster_offset, offset, cur_bytes,
-                                    qiov, bytes_done, l2meta);
+        if (!aio && cur_bytes != bytes) {
+            aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
+        }
+        ret = qcow2_add_task(bs, aio, qcow2_co_pwritev_task_entry, 0,
+                             cluster_offset, offset, cur_bytes,
+                             qiov, bytes_done, l2meta);
         l2meta = NULL; /* l2meta is consumed by qcow2_co_do_pwritev() */
         if (ret < 0) {
             goto fail_nometa;
@@ -2384,6 +2479,14 @@ out_locked:
     qemu_co_mutex_unlock(&s->lock);
 
 fail_nometa:
+    if (aio) {
+        aio_task_pool_wait_all(aio);
+        if (ret == 0) {
+            ret = aio_task_pool_status(aio);
+        }
+        g_free(aio);
+    }
+
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
 
     return ret;
diff --git a/block/trace-events b/block/trace-events
index d724df0117..7f51550ba3 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -61,6 +61,7 @@ file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "
 file_copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64
 
 # qcow2.c
+qcow2_add_task(void *co, void *bs, void *pool, const char *action, int cluster_type, uint64_t file_cluster_offset, uint64_t offset, uint64_t bytes, void *qiov, size_t qiov_offset) "co %p bs %p pool %p: %s: cluster_type %d file_cluster_offset %" PRIu64 " offset %" PRIu64 " bytes %" PRIu64 " qiov %p qiov_offset %zu"
 qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
 qcow2_writev_done_req(void *co, int ret) "co %p ret %d"
 qcow2_writev_start_part(void *co) "co %p"
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
@ 2019-08-13 20:47   ` Max Reitz
  2019-08-14  8:18     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 15+ messages in thread
From: Max Reitz @ 2019-08-13 20:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, armbru, stefanha


[-- Attachment #1.1: Type: text/plain, Size: 2711 bytes --]

On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
> Common interface for aio task loops. To be used for improving
> performance of synchronous io loops in qcow2, block-stream,
> copy-on-read, and may be other places.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---

Looks good to me overall.

>  block/aio_task.h    |  52 +++++++++++++++++++

I’ve move this to include/block/.

>  block/aio_task.c    | 119 ++++++++++++++++++++++++++++++++++++++++++++
>  block/Makefile.objs |   2 +
>  3 files changed, 173 insertions(+)
>  create mode 100644 block/aio_task.h
>  create mode 100644 block/aio_task.c
> 
> diff --git a/block/aio_task.h b/block/aio_task.h
> new file mode 100644
> index 0000000000..933af1d8e7
> --- /dev/null
> +++ b/block/aio_task.h

[...]

> +typedef struct AioTaskPool AioTaskPool;
> +typedef struct AioTask AioTask;
> +typedef int (*AioTaskFunc)(AioTask *task);

+coroutine_fn

> +struct AioTask {
> +    AioTaskPool *pool;
> +    AioTaskFunc func;
> +    int ret;
> +};
> +
> +/*
> + * aio_task_pool_new
> + *
> + * The caller is responsible to g_free AioTaskPool pointer after use.

s/to g_free/for g_freeing/ or something similar.

Or you’d just add aio_task_pool_free().

> + */
> +AioTaskPool *aio_task_pool_new(int max_busy_tasks);
> +int aio_task_pool_status(AioTaskPool *pool);

A comment wouldn’t hurt.  It wasn’t immediately clear to me that status
refers to the error code of a failing task (or 0), although it wasn’t
too much of a surprise either.

> +bool aio_task_pool_empty(AioTaskPool *pool);
> +void aio_task_pool_start_task(AioTaskPool *pool, AioTask *task);

Maybe make a note that task->pool will be set automatically?

> +void aio_task_pool_wait_slot(AioTaskPool *pool);
> +void aio_task_pool_wait_one(AioTaskPool *pool);
> +void aio_task_pool_wait_all(AioTaskPool *pool);

Shouldn’t all of these but aio_task_pool_empty() and
aio_task_pool_status() be coroutine_fns?

> +#endif /* BLOCK_AIO_TASK_H */
> diff --git a/block/aio_task.c b/block/aio_task.c
> new file mode 100644
> index 0000000000..807be8deb5
> --- /dev/null
> +++ b/block/aio_task.c

[...]

> +static void aio_task_co(void *opaque)

+coroutine_fn

[...]

> +void aio_task_pool_wait_one(AioTaskPool *pool)
> +{
> +    assert(pool->busy_tasks > 0);
> +    assert(qemu_coroutine_self() == pool->main_co);
> +
> +    pool->wait_done = true;

Hmmm, but the wait actually isn’t done yet. :-)

Maybe s/wait_done/waiting/?

Max

> +    qemu_coroutine_yield();
> +
> +    assert(!pool->wait_done);
> +    assert(pool->busy_tasks < pool->max_busy_tasks);
> +}


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
@ 2019-08-13 21:31   ` Max Reitz
  2019-08-14  9:11     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 15+ messages in thread
From: Max Reitz @ 2019-08-13 21:31 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, armbru, stefanha


[-- Attachment #1.1: Type: text/plain, Size: 2966 bytes --]

On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
> Further patch will run partial requests of iterations of
> qcow2_co_preadv in parallel for performance reasons. To prepare for
> this, separate part which may be parallelized into separate function
> (qcow2_co_preadv_task).
> 
> While being here, also separate encrypted clusters reading to own
> function, like it is done for compressed reading.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qapi/block-core.json |   2 +-
>  block/qcow2.c        | 206 +++++++++++++++++++++++--------------------
>  2 files changed, 112 insertions(+), 96 deletions(-)

Looks good to me overall, just wondering about some details, as always.

> diff --git a/block/qcow2.c b/block/qcow2.c
> index 93ab7edcea..7fa71968b2 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1967,17 +1967,115 @@ out:
>      return ret;
>  }
>  
> +static coroutine_fn int
> +qcow2_co_preadv_encrypted(BlockDriverState *bs,
> +                           uint64_t file_cluster_offset,
> +                           uint64_t offset,
> +                           uint64_t bytes,
> +                           QEMUIOVector *qiov,
> +                           uint64_t qiov_offset)
> +{
> +    int ret;
> +    BDRVQcow2State *s = bs->opaque;
> +    uint8_t *buf;
> +
> +    assert(bs->encrypted && s->crypto);
> +    assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
> +
> +    /*
> +     * For encrypted images, read everything into a temporary
> +     * contiguous buffer on which the AES functions can work.
> +     * Note, that we can implement enctyption, working on qiov,

-, and s/enctyption/encryption/

> +     * but we must not do decryption in guest buffers for security
> +     * reasons.

"for security reasons" is a bit handwave-y, no?

[...]

> +static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
> +                                             QCow2ClusterType cluster_type,
> +                                             uint64_t file_cluster_offset,
> +                                             uint64_t offset, uint64_t bytes,
> +                                             QEMUIOVector *qiov,
> +                                             size_t qiov_offset)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +    int offset_in_cluster = offset_into_cluster(s, offset);
> +
> +    switch (cluster_type) {

[...]

> +    default:
> +        g_assert_not_reached();
> +        /*
> +         * QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC handled
> +         * in qcow2_co_preadv_part

Hmm, I’d still add them explicitly as cases and put their own
g_assert_not_reach() there.

> +         */
> +    }
> +
> +    g_assert_not_reached();
> +
> +    return -EIO;

Maybe abort()ing instead of g_assert_not_reach() would save you from
having to return here?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool
  2019-08-13 20:47   ` Max Reitz
@ 2019-08-14  8:18     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-14  8:18 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, armbru, stefanha, Denis Lunev

13.08.2019 23:47, Max Reitz wrote:
> On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
>> Common interface for aio task loops. To be used for improving
>> performance of synchronous io loops in qcow2, block-stream,
>> copy-on-read, and may be other places.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
> 
> Looks good to me overall.
> 
>>   block/aio_task.h    |  52 +++++++++++++++++++
> 
> I’ve move this to include/block/.
> 
>>   block/aio_task.c    | 119 ++++++++++++++++++++++++++++++++++++++++++++
>>   block/Makefile.objs |   2 +
>>   3 files changed, 173 insertions(+)
>>   create mode 100644 block/aio_task.h
>>   create mode 100644 block/aio_task.c
>>
>> diff --git a/block/aio_task.h b/block/aio_task.h
>> new file mode 100644
>> index 0000000000..933af1d8e7
>> --- /dev/null
>> +++ b/block/aio_task.h
> 
> [...]
> 
>> +typedef struct AioTaskPool AioTaskPool;
>> +typedef struct AioTask AioTask;
>> +typedef int (*AioTaskFunc)(AioTask *task);
> 
> +coroutine_fn
> 
>> +struct AioTask {
>> +    AioTaskPool *pool;
>> +    AioTaskFunc func;
>> +    int ret;
>> +};
>> +
>> +/*
>> + * aio_task_pool_new
>> + *
>> + * The caller is responsible to g_free AioTaskPool pointer after use.
> 
> s/to g_free/for g_freeing/ or something similar.
> 
> Or you’d just add aio_task_pool_free().
> 
>> + */
>> +AioTaskPool *aio_task_pool_new(int max_busy_tasks);
>> +int aio_task_pool_status(AioTaskPool *pool);
> 
> A comment wouldn’t hurt.  It wasn’t immediately clear to me that status
> refers to the error code of a failing task (or 0), although it wasn’t
> too much of a surprise either.
> 
>> +bool aio_task_pool_empty(AioTaskPool *pool);
>> +void aio_task_pool_start_task(AioTaskPool *pool, AioTask *task);
> 
> Maybe make a note that task->pool will be set automatically?
> 
>> +void aio_task_pool_wait_slot(AioTaskPool *pool);
>> +void aio_task_pool_wait_one(AioTaskPool *pool);
>> +void aio_task_pool_wait_all(AioTaskPool *pool);
> 
> Shouldn’t all of these but aio_task_pool_empty() and
> aio_task_pool_status() be coroutine_fns?
> 
>> +#endif /* BLOCK_AIO_TASK_H */
>> diff --git a/block/aio_task.c b/block/aio_task.c
>> new file mode 100644
>> index 0000000000..807be8deb5
>> --- /dev/null
>> +++ b/block/aio_task.c
> 
> [...]
> 
>> +static void aio_task_co(void *opaque)
> 
> +coroutine_fn
> 
> [...]
> 
>> +void aio_task_pool_wait_one(AioTaskPool *pool)
>> +{
>> +    assert(pool->busy_tasks > 0);
>> +    assert(qemu_coroutine_self() == pool->main_co);
>> +
>> +    pool->wait_done = true;
> 
> Hmmm, but the wait actually isn’t done yet. :-)
> 
> Maybe s/wait_done/waiting/?
> 


Aha, really bad variable name. I meant "wait for one task done". Just "waiting" would be appropriate.

Thanks for reviewing!

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part
  2019-08-13 21:31   ` Max Reitz
@ 2019-08-14  9:11     ` Vladimir Sementsov-Ogievskiy
  2019-08-14 15:03       ` Max Reitz
  2019-08-14 15:15       ` Eric Blake
  0 siblings, 2 replies; 15+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-14  9:11 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, armbru, stefanha, Denis Lunev

14.08.2019 0:31, Max Reitz wrote:
> On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
>> Further patch will run partial requests of iterations of
>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>> this, separate part which may be parallelized into separate function
>> (qcow2_co_preadv_task).
>>
>> While being here, also separate encrypted clusters reading to own
>> function, like it is done for compressed reading.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json |   2 +-
>>   block/qcow2.c        | 206 +++++++++++++++++++++++--------------------
>>   2 files changed, 112 insertions(+), 96 deletions(-)
> 
> Looks good to me overall, just wondering about some details, as always.
> 
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 93ab7edcea..7fa71968b2 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1967,17 +1967,115 @@ out:
>>       return ret;
>>   }
>>   
>> +static coroutine_fn int
>> +qcow2_co_preadv_encrypted(BlockDriverState *bs,
>> +                           uint64_t file_cluster_offset,
>> +                           uint64_t offset,
>> +                           uint64_t bytes,
>> +                           QEMUIOVector *qiov,
>> +                           uint64_t qiov_offset)
>> +{
>> +    int ret;
>> +    BDRVQcow2State *s = bs->opaque;
>> +    uint8_t *buf;
>> +
>> +    assert(bs->encrypted && s->crypto);
>> +    assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
>> +
>> +    /*
>> +     * For encrypted images, read everything into a temporary
>> +     * contiguous buffer on which the AES functions can work.
>> +     * Note, that we can implement enctyption, working on qiov,
> 
> -, and s/enctyption/encryption/
> 
>> +     * but we must not do decryption in guest buffers for security
>> +     * reasons.
> 
> "for security reasons" is a bit handwave-y, no?

Hmm, let's think of it a bit.

WRITE

1. We can't do any operations on write buffers, as guest may use them for
something else and not prepared for their change. [thx to Den, pointed to this fact]

READ

Hmm, here otherwise, guest should not expect something meaningful in buffers until the
end of read operation, so theoretically we may decrypt directly in guest buffer.. What is
bad with it?

1. Making read-part different from write and implementing support of qiov for decryptin for
little outcome (hmm, don't double allocation for reads, is it little or not? [*]).

2. Guest can read its buffers.
So, it may see encrypted data and guess something about it. Ideally guest
should know nothing about encryption, but on the other hand, is there any
real damage? I don't sure..

3. Guest can modify its buffers.
3.1 I think there is no guarantee that guest will not modify its data before we finished
copying to separate buffer, so what guest finally reads is not predictable anyway.
3.2 But, modifying during decryption may possibly lead to guest visible error
(which will never be if we operate on separated cluster)

So if we don't afraid of [2] and [3.2], and in a specific case [*] is significant, we may want
implement decryption on guest buffers at least as an option..
But all it looks for me like we'll never do it.

===

So, I'd rewrite my "Note" like this:

    Also, decryption in separate buffer is better as it hides from the guest information
    it doesn't own (about encrypted nature of virtual disk).

> 
> [...]
> 
>> +static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
>> +                                             QCow2ClusterType cluster_type,
>> +                                             uint64_t file_cluster_offset,
>> +                                             uint64_t offset, uint64_t bytes,
>> +                                             QEMUIOVector *qiov,
>> +                                             size_t qiov_offset)
>> +{
>> +    BDRVQcow2State *s = bs->opaque;
>> +    int offset_in_cluster = offset_into_cluster(s, offset);
>> +
>> +    switch (cluster_type) {
> 
> [...]
> 
>> +    default:
>> +        g_assert_not_reached();
>> +        /*
>> +         * QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC handled
>> +         * in qcow2_co_preadv_part
> 
> Hmm, I’d still add them explicitly as cases and put their own
> g_assert_not_reach() there.
> 
>> +         */
>> +    }
>> +
>> +    g_assert_not_reached();
>> +
>> +    return -EIO;
> 
> Maybe abort()ing instead of g_assert_not_reach() would save you from
> having to return here?
> 

Hmm, will check. Any reason to use g_assert_not_reached() instead of abort() in "default"?
I just kept it like it was. But it seems to be more often practice to use just abort() in
Qemu code.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part
  2019-08-14  9:11     ` Vladimir Sementsov-Ogievskiy
@ 2019-08-14 15:03       ` Max Reitz
  2019-08-14 15:15       ` Eric Blake
  1 sibling, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-08-14 15:03 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, armbru, stefanha, Denis Lunev


[-- Attachment #1.1: Type: text/plain, Size: 5531 bytes --]

On 14.08.19 11:11, Vladimir Sementsov-Ogievskiy wrote:
> 14.08.2019 0:31, Max Reitz wrote:
>> On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
>>> Further patch will run partial requests of iterations of
>>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>>> this, separate part which may be parallelized into separate function
>>> (qcow2_co_preadv_task).
>>>
>>> While being here, also separate encrypted clusters reading to own
>>> function, like it is done for compressed reading.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   qapi/block-core.json |   2 +-
>>>   block/qcow2.c        | 206 +++++++++++++++++++++++--------------------
>>>   2 files changed, 112 insertions(+), 96 deletions(-)
>>
>> Looks good to me overall, just wondering about some details, as always.
>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 93ab7edcea..7fa71968b2 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -1967,17 +1967,115 @@ out:
>>>       return ret;
>>>   }
>>>   
>>> +static coroutine_fn int
>>> +qcow2_co_preadv_encrypted(BlockDriverState *bs,
>>> +                           uint64_t file_cluster_offset,
>>> +                           uint64_t offset,
>>> +                           uint64_t bytes,
>>> +                           QEMUIOVector *qiov,
>>> +                           uint64_t qiov_offset)
>>> +{
>>> +    int ret;
>>> +    BDRVQcow2State *s = bs->opaque;
>>> +    uint8_t *buf;
>>> +
>>> +    assert(bs->encrypted && s->crypto);
>>> +    assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
>>> +
>>> +    /*
>>> +     * For encrypted images, read everything into a temporary
>>> +     * contiguous buffer on which the AES functions can work.
>>> +     * Note, that we can implement enctyption, working on qiov,
>>
>> -, and s/enctyption/encryption/
>>
>>> +     * but we must not do decryption in guest buffers for security
>>> +     * reasons.
>>
>> "for security reasons" is a bit handwave-y, no?
> 
> Hmm, let's think of it a bit.
> 
> WRITE
> 
> 1. We can't do any operations on write buffers, as guest may use them for
> something else and not prepared for their change. [thx to Den, pointed to this fact]

Yep.  So we actually cannot implement encryption in the guest buffer. :-)

> READ
> 
> Hmm, here otherwise, guest should not expect something meaningful in buffers until the
> end of read operation, so theoretically we may decrypt directly in guest buffer.. What is
> bad with it?
> 
> 1. Making read-part different from write and implementing support of qiov for decryptin for
> little outcome (hmm, don't double allocation for reads, is it little or not? [*]).
> 
> 2. Guest can read its buffers.
> So, it may see encrypted data and guess something about it. Ideally guest
> should know nothing about encryption, but on the other hand, is there any
> real damage? I don't sure..
> 
> 3. Guest can modify its buffers.
> 3.1 I think there is no guarantee that guest will not modify its data before we finished
> copying to separate buffer, so what guest finally reads is not predictable anyway.
> 3.2 But, modifying during decryption may possibly lead to guest visible error
> (which will never be if we operate on separated cluster)
> 
> So if we don't afraid of [2] and [3.2], and in a specific case [*] is significant, we may want
> implement decryption on guest buffers at least as an option..
> But all it looks for me like we'll never do it.

Well, I do think it would be weird from a guest perspective to see data
changing twice.  I just couldn’t figure out what the security problem
might be.

> ===
> 
> So, I'd rewrite my "Note" like this:
> 
>     Also, decryption in separate buffer is better as it hides from the guest information
>     it doesn't own (about encrypted nature of virtual disk).

Sounds good.

>> [...]
>>
>>> +static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
>>> +                                             QCow2ClusterType cluster_type,
>>> +                                             uint64_t file_cluster_offset,
>>> +                                             uint64_t offset, uint64_t bytes,
>>> +                                             QEMUIOVector *qiov,
>>> +                                             size_t qiov_offset)
>>> +{
>>> +    BDRVQcow2State *s = bs->opaque;
>>> +    int offset_in_cluster = offset_into_cluster(s, offset);
>>> +
>>> +    switch (cluster_type) {
>>
>> [...]
>>
>>> +    default:
>>> +        g_assert_not_reached();
>>> +        /*
>>> +         * QCOW2_CLUSTER_ZERO_PLAIN and QCOW2_CLUSTER_ZERO_ALLOC handled
>>> +         * in qcow2_co_preadv_part
>>
>> Hmm, I’d still add them explicitly as cases and put their own
>> g_assert_not_reach() there.
>>
>>> +         */
>>> +    }
>>> +
>>> +    g_assert_not_reached();
>>> +
>>> +    return -EIO;
>>
>> Maybe abort()ing instead of g_assert_not_reach() would save you from
>> having to return here?
>>
> 
> Hmm, will check. Any reason to use g_assert_not_reached() instead of abort() in "default"?

g_assert_not_reached() makes it more explicit what it’s about.

> I just kept it like it was. But it seems to be more often practice to use just abort() in
> Qemu code.

Yes, we often use abort() instead because it always has _Noreturn (and
thus saves us from such useless return statements).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part
  2019-08-14  9:11     ` Vladimir Sementsov-Ogievskiy
  2019-08-14 15:03       ` Max Reitz
@ 2019-08-14 15:15       ` Eric Blake
  2019-08-14 15:58         ` Max Reitz
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Blake @ 2019-08-14 15:15 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-devel, qemu-block
  Cc: kwolf, armbru, stefanha, Denis Lunev


[-- Attachment #1.1: Type: text/plain, Size: 3764 bytes --]

On 8/14/19 4:11 AM, Vladimir Sementsov-Ogievskiy wrote:
> 14.08.2019 0:31, Max Reitz wrote:
>> On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
>>> Further patch will run partial requests of iterations of
>>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>>> this, separate part which may be parallelized into separate function
>>> (qcow2_co_preadv_task).
>>>
>>> While being here, also separate encrypted clusters reading to own
>>> function, like it is done for compressed reading.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---

>>> +     * but we must not do decryption in guest buffers for security
>>> +     * reasons.
>>
>> "for security reasons" is a bit handwave-y, no?
> 
> Hmm, let's think of it a bit.
> 
> WRITE
> 
> 1. We can't do any operations on write buffers, as guest may use them for
> something else and not prepared for their change. [thx to Den, pointed to this fact]
> 
> READ
> 
> Hmm, here otherwise, guest should not expect something meaningful in buffers until the
> end of read operation, so theoretically we may decrypt directly in guest buffer.. What is
> bad with it?

The badness is that the guest can theoretically reverse-engineer the
encryption keys if they are savvy enough to grab the contents of the
buffer before and after.  The guest must NEVER be able to see the
encrypted bits, which means decryption requires a bounce buffer.

> 
> 1. Making read-part different from write and implementing support of qiov for decryptin for
> little outcome (hmm, don't double allocation for reads, is it little or not? [*]).
> 
> 2. Guest can read its buffers.
> So, it may see encrypted data and guess something about it. Ideally guest
> should know nothing about encryption, but on the other hand, is there any
> real damage? I don't sure..

Yes, this is the security risk.

> 
> 3. Guest can modify its buffers.
> 3.1 I think there is no guarantee that guest will not modify its data before we finished
> copying to separate buffer, so what guest finally reads is not predictable anyway.
> 3.2 But, modifying during decryption may possibly lead to guest visible error
> (which will never be if we operate on separated cluster)
> 
> So if we don't afraid of [2] and [3.2], and in a specific case [*] is significant, we may want
> implement decryption on guest buffers at least as an option..
> But all it looks for me like we'll never do it.
> 
> ===
> 
> So, I'd rewrite my "Note" like this:
> 
>     Also, decryption in separate buffer is better as it hides from the guest information
>     it doesn't own (about encrypted nature of virtual disk).

Possible wording tweak:

Also, decryption in a separate buffer is better as it prevents the guest
from learning information about the encrypted nature of the virtual disk.


>>> +    }
>>> +
>>> +    g_assert_not_reached();
>>> +
>>> +    return -EIO;
>>
>> Maybe abort()ing instead of g_assert_not_reach() would save you from
>> having to return here?
>>
> 
> Hmm, will check. Any reason to use g_assert_not_reached() instead of abort() in "default"?
> I just kept it like it was. But it seems to be more often practice to use just abort() in
> Qemu code.

Both are used. abort() is shorter to type, but g_assert_not_reach() is
slightly friendlier to developers (which are the only people that would
ever see the failure).  As both are marked noreturn, the real fix is to
drop the dead return -EIO line, the compiler is smart enough to not need
a return statement after a noreturn function.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
@ 2019-08-14 15:55   ` Max Reitz
  2019-08-14 16:23   ` Max Reitz
  1 sibling, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-08-14 15:55 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, armbru, stefanha


[-- Attachment #1.1: Type: text/plain, Size: 885 bytes --]

On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
> Similarly to previous commit, prepare for parallelizing write-loop
> iterations.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2.c | 150 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 88 insertions(+), 62 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7fa71968b2..37766b8b7c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2283,62 +2362,11 @@ static coroutine_fn int qcow2_co_pwritev_part(

[...]

> +        ret = qcow2_co_pwritev_task(bs, cluster_offset, offset, cur_bytes,
> +                                    qiov, bytes_done, l2meta);

You’re passing bytes_done as qiov_offset here.  That is initialized to
0, so it ignores the qiov_offset given to qcow2_co_pwritev_part().

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part
  2019-08-14 15:15       ` Eric Blake
@ 2019-08-14 15:58         ` Max Reitz
  0 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-08-14 15:58 UTC (permalink / raw)
  To: Eric Blake, Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, armbru, stefanha, Denis Lunev


[-- Attachment #1.1: Type: text/plain, Size: 1628 bytes --]

On 14.08.19 17:15, Eric Blake wrote:
> On 8/14/19 4:11 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 14.08.2019 0:31, Max Reitz wrote:
>>> On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
>>>> Further patch will run partial requests of iterations of
>>>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>>>> this, separate part which may be parallelized into separate function
>>>> (qcow2_co_preadv_task).
>>>>
>>>> While being here, also separate encrypted clusters reading to own
>>>> function, like it is done for compressed reading.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
> 
>>>> +     * but we must not do decryption in guest buffers for security
>>>> +     * reasons.
>>>
>>> "for security reasons" is a bit handwave-y, no?
>>
>> Hmm, let's think of it a bit.
>>
>> WRITE
>>
>> 1. We can't do any operations on write buffers, as guest may use them for
>> something else and not prepared for their change. [thx to Den, pointed to this fact]
>>
>> READ
>>
>> Hmm, here otherwise, guest should not expect something meaningful in buffers until the
>> end of read operation, so theoretically we may decrypt directly in guest buffer.. What is
>> bad with it?
> 
> The badness is that the guest can theoretically reverse-engineer the
> encryption keys if they are savvy enough to grab the contents of the
> buffer before and after.  The guest must NEVER be able to see the
> encrypted bits, which means decryption requires a bounce buffer.

Our encryption does not protect against a known-plaintext attack?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
  2019-08-14 15:55   ` Max Reitz
@ 2019-08-14 16:23   ` Max Reitz
  1 sibling, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-08-14 16:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, armbru, stefanha


[-- Attachment #1.1: Type: text/plain, Size: 807 bytes --]

On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
> Similarly to previous commit, prepare for parallelizing write-loop
> iterations.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2.c | 150 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 88 insertions(+), 62 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7fa71968b2..37766b8b7c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> + * l2meta  - if not NULL, qcow2_co_do_pwritev() will consume it. Caller must not
> + *           use it somehow after qcow2_co_pwritev_task() call

[...]

> +        l2meta = NULL; /* l2meta is consumed by qcow2_co_do_pwritev() */

By the way, qcow2_co_do_pwritev() does not exist. :-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/4] block/qcow2: introduce parallel subrequest handling in read and write
  2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 4/4] block/qcow2: introduce parallel subrequest handling in read and write Vladimir Sementsov-Ogievskiy
@ 2019-08-14 16:24   ` Max Reitz
  0 siblings, 0 replies; 15+ messages in thread
From: Max Reitz @ 2019-08-14 16:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, den, armbru, stefanha


[-- Attachment #1.1: Type: text/plain, Size: 1550 bytes --]

On 30.07.19 16:18, Vladimir Sementsov-Ogievskiy wrote:
> It improves performance for fragmented qcow2 images.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2.c      | 125 +++++++++++++++++++++++++++++++++++++++++----
>  block/trace-events |   1 +
>  2 files changed, 115 insertions(+), 11 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 37766b8b7c..5f0e66ea48 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2017,6 +2018,62 @@ fail:
>      return ret;
>  }
>  
> +typedef struct Qcow2AioTask {
> +    AioTask task;
> +
> +    BlockDriverState *bs;
> +    QCow2ClusterType cluster_type; /* only for read */
> +    uint64_t file_cluster_offset;
> +    uint64_t offset;
> +    uint64_t bytes;
> +    QEMUIOVector *qiov;
> +    uint64_t qiov_offset;
> +    QCowL2Meta *l2meta; /* only for write */
> +} Qcow2AioTask;
> +
> +#define QCOW2_MAX_WORKERS 8

Maybe move this to the top, or even qcow2.h?

[...]

> @@ -2112,7 +2182,16 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
>          qiov_offset += cur_bytes;
>      }
>  
> -    return 0;
> +out:
> +    if (aio) {
> +        aio_task_pool_wait_all(aio);
> +        if (ret == 0) {
> +            ret = aio_task_pool_status(aio);
> +        }
> +        g_free(aio);
> +    }
> +
> +    return ret;

My gcc complains here that ret may be initialized.  (Which is indeed the
case if @bytes is 0.)

The rest looks good to me.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-14 16:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 14:18 [Qemu-devel] [PATCH v2 0/4] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 1/4] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
2019-08-13 20:47   ` Max Reitz
2019-08-14  8:18     ` Vladimir Sementsov-Ogievskiy
2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 2/4] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
2019-08-13 21:31   ` Max Reitz
2019-08-14  9:11     ` Vladimir Sementsov-Ogievskiy
2019-08-14 15:03       ` Max Reitz
2019-08-14 15:15       ` Eric Blake
2019-08-14 15:58         ` Max Reitz
2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 3/4] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
2019-08-14 15:55   ` Max Reitz
2019-08-14 16:23   ` Max Reitz
2019-07-30 14:18 ` [Qemu-devel] [PATCH v2 4/4] block/qcow2: introduce parallel subrequest handling in read and write Vladimir Sementsov-Ogievskiy
2019-08-14 16:24   ` Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).