QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io
@ 2019-08-16 15:30 Vladimir Sementsov-Ogievskiy
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache Vladimir Sementsov-Ogievskiy
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-16 15:30 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den

Hi all!

Here is an asynchronous scheme for handling fragmented qcow2
reads and writes. Both qcow2 read and write functions loops through
sequential portions of data. The series aim it to parallelize these
loops iterations.
It improves performance for fragmented qcow2 images, I've tested it
as described below.

v4 [perf results not updated]:
01: new patch. Unrelated, but need to fix 026 before the series to
    correctly fix it after :)
02: - use coroutine_fn where appropriate (i.e. in aio_task_pool_new too)
    - add Max's r-b
03,04: add Max's r-b
05: fix 026 output

v3 (by Max's comments) [perf results not updated]:

01: - use coroutine_fn where appropriate !!!!!!!!!!!!!!!!!!!!!!!
    - add aio_task_pool_free
    - add some comments
    - move header to include/block
    - s/wait_done/waiting
02: - Rewrite note about decryption in guest buffers [thx to Eric]
    - separate g_assert_not_reached for QCOW2_CLUSTER_ZERO_*
    - drop return after g_assert_not_reached
03: - drop bytes_done and correctly use qiov_offset
    - fix comment
04: - move QCOW2_MAX_WORKERS to block/qcow2.h
    - initialize ret in qcow2_co_preadv_part
Based-on: https://github.com/stefanha/qemu/commits/block


v2: changed a lot, as
 1. a lot of preparations around locks, hd_qiovs, threads for encryption
    are done
 2. I decided to create separate file with async request handling API, to
    reuse it for backup, stream and copy-on-read to improve their performance
    too. Mirror and qemu-img convert has their own async request handling,
    may be we'll be able finally merge all these similar code into one
    feature.
    Note that not all API calls used in qcow2, some will be needed on
    following steps for parallelizing other io loops.

About testing:

I have four 4G qcow2 images (with default 64k block size) on my ssd disk:
t-seq.qcow2 - sequentially written qcow2 image
t-reverse.qcow2 - filled by writing 64k portions from end to the start
t-rand.qcow2 - filled by writing 64k portions (aligned) in random order
t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters
(see source code of image generation in the end for details)

and I've done several runs like the following (sequential io by 1mb chunks):

    out=/tmp/block; echo > $out; cat /tmp/files | while read file; do for wr in {"","-w"}; do echo "$file" $wr; ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none $wr "$file" | grep 'Run completed in' | awk '{print $4}' >> $out; done; done


short info about parameters:
  -w - do writes (otherwise do reads)
  -c - count of blocks
  -s - block size
  -t none - disable cache
  -n - native aio
  -d 1 - don't use parallel requests provided by qemu-img bench itself

results:

    +---------------------------+---------+---------+
    | file                      | master  | async   |
    +---------------------------+---------+---------+
    | /ssd/t-part-rand.qcow2    | 14.671  | 9.193   |
    +---------------------------+---------+---------+
    | /ssd/t-part-rand.qcow2 -w | 11.434  | 8.621   |
    +---------------------------+---------+---------+
    | /ssd/t-rand.qcow2         | 20.421  | 10.05   |
    +---------------------------+---------+---------+
    | /ssd/t-rand.qcow2 -w      | 11.097  | 8.915   |
    +---------------------------+---------+---------+
    | /ssd/t-reverse.qcow2      | 17.515  | 9.407   |
    +---------------------------+---------+---------+
    | /ssd/t-reverse.qcow2 -w   | 11.255  | 8.649   |
    +---------------------------+---------+---------+
    | /ssd/t-seq.qcow2          | 9.081   | 9.072   |
    +---------------------------+---------+---------+
    | /ssd/t-seq.qcow2 -w       | 8.761   | 8.747   |
    +---------------------------+---------+---------+
    | /tmp/t-part-rand.qcow2    | 41.179  | 41.37   |
    +---------------------------+---------+---------+
    | /tmp/t-part-rand.qcow2 -w | 54.097  | 55.323  |
    +---------------------------+---------+---------+
    | /tmp/t-rand.qcow2         | 711.899 | 514.339 |
    +---------------------------+---------+---------+
    | /tmp/t-rand.qcow2 -w      | 546.259 | 642.114 |
    +---------------------------+---------+---------+
    | /tmp/t-reverse.qcow2      | 86.065  | 96.522  |
    +---------------------------+---------+---------+
    | /tmp/t-reverse.qcow2 -w   | 46.557  | 48.499  |
    +---------------------------+---------+---------+
    | /tmp/t-seq.qcow2          | 33.804  | 33.862  |
    +---------------------------+---------+---------+
    | /tmp/t-seq.qcow2 -w       | 34.299  | 34.233  |
    +---------------------------+---------+---------+


Performance gain is obvious, especially for read and especially for ssd.
For hdd there is a degradation for reverse case, but this is the most
impossible case and seems not critical.

How images are generated:

    ==== gen-writes ======
    #!/usr/bin/env python
    import random
    import sys

    size = 4 * 1024 * 1024 * 1024
    block = 64 * 1024
    block2 = 1024 * 1024

    arg = sys.argv[1]

    if arg in ('rand', 'reverse', 'seq'):
        writes = list(range(0, size, block))

    if arg == 'rand':
        random.shuffle(writes)
    elif arg == 'reverse':
        writes.reverse()
    elif arg == 'part-rand':
        writes = []
        for off in range(0, size, block2):
            wr = list(range(off, off + block2, block))
            random.shuffle(wr)
            writes.extend(wr)
    elif arg != 'seq':
        sys.exit(1)

    for w in writes:
        print 'write -P 0xff {} {}'.format(w, block)

    print 'q'
    ==========================

    ===== gen-test-images.sh =====
    #!/bin/bash

    IMG_PATH=/ssd

    for name in seq reverse rand part-rand; do
        IMG=$IMG_PATH/t-$name.qcow2
        echo createing $IMG ...
        rm -f $IMG
        qemu-img create -f qcow2 $IMG 4G
        gen-writes $name | qemu-io $IMG
    done
    ==============================


Vladimir Sementsov-Ogievskiy (5):
  tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache
  block: introduce aio task pool
  block/qcow2: refactor qcow2_co_preadv_part
  block/qcow2: refactor qcow2_co_pwritev_part
  block/qcow2: introduce parallel subrequest handling in read and write

 qapi/block-core.json               |   2 +-
 block/qcow2.h                      |   3 +
 include/block/aio_task.h           |  54 ++++
 block/aio_task.c                   | 124 ++++++++
 block/qcow2.c                      | 461 +++++++++++++++++++----------
 block/Makefile.objs                |   2 +
 block/trace-events                 |   1 +
 tests/qemu-iotests/026.out         |  18 +-
 tests/qemu-iotests/026.out.nocache | 188 ++++++------
 9 files changed, 591 insertions(+), 262 deletions(-)
 create mode 100644 include/block/aio_task.h
 create mode 100644 block/aio_task.c

-- 
2.18.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache
  2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
@ 2019-08-16 15:30 ` Vladimir Sementsov-Ogievskiy
  2019-09-09 21:58   ` [Qemu-devel] [Qemu-block] " John Snow
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 2/5] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-16 15:30 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den

qemu-io now prefixes its error and warnings with "qemu-io:".
36b9986b08787019e fixed a lot of iotests output but forget about
026.out.nocache. Fix it too.

Fixes: 99e98d7c9fc1a1639fad ("qemu-io: Use error_[gs]et_progname()")
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/026.out.nocache | 168 ++++++++++++++---------------
 1 file changed, 84 insertions(+), 84 deletions(-)

diff --git a/tests/qemu-iotests/026.out.nocache b/tests/qemu-iotests/026.out.nocache
index 1ca6cda15c..6dda95dfb4 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -14,8 +14,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 
 1 leaked clusters were found on the image.
@@ -23,8 +23,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 
 1 leaked clusters were found on the image.
@@ -42,8 +42,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 1 leaked clusters were found on the image.
@@ -51,8 +51,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_update; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 1 leaked clusters were found on the image.
@@ -136,8 +136,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_update; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -146,8 +146,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_update; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -168,8 +168,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: l2_update; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -178,8 +178,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_update; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 wrote 131072/131072 bytes at offset 0
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
@@ -198,15 +198,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_alloc_write; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_alloc_write; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 
 1 leaked clusters were found on the image.
@@ -224,15 +224,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_alloc_write; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l2_alloc_write; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 1 leaked clusters were found on the image.
@@ -250,15 +250,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: write_aio; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: write_aio; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -274,15 +274,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: write_aio; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: write_aio; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -298,15 +298,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_load; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_load; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -322,15 +322,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_load; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_load; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -346,15 +346,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_update_part; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_update_part; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -370,15 +370,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_update_part; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_update_part; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -394,15 +394,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc; errno: 5; imm: off; once: off; write 
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc; errno: 5; imm: off; once: off; write -b
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -418,15 +418,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -485,8 +485,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 55 leaked clusters were found on the image.
@@ -494,8 +494,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_hookup; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 251 leaked clusters were found on the image.
@@ -513,15 +513,15 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -537,8 +537,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 10 leaked clusters were found on the image.
@@ -546,8 +546,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_blocks; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 23 leaked clusters were found on the image.
@@ -565,8 +565,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 10 leaked clusters were found on the image.
@@ -574,8 +574,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_write_table; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 23 leaked clusters were found on the image.
@@ -593,8 +593,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write 
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 10 leaked clusters were found on the image.
@@ -602,8 +602,8 @@ This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: refblock_alloc_switch_table; errno: 28; imm: off; once: off; write -b
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 23 leaked clusters were found on the image.
@@ -639,8 +639,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_write_table; errno: 5; imm: off; once: off
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -651,8 +651,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_write_table; errno: 28; imm: off; once: off
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
@@ -663,8 +663,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_activate_table; errno: 5; imm: off; once: off
-Failed to flush the L2 table cache: Input/output error
-Failed to flush the refcount block cache: Input/output error
+qemu-io: Failed to flush the L2 table cache: Input/output error
+qemu-io: Failed to flush the refcount block cache: Input/output error
 write failed: Input/output error
 
 96 leaked clusters were found on the image.
@@ -677,8 +677,8 @@ No errors were found on the image.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
 Event: l1_grow_activate_table; errno: 28; imm: off; once: off
-Failed to flush the L2 table cache: No space left on device
-Failed to flush the refcount block cache: No space left on device
+qemu-io: Failed to flush the L2 table cache: No space left on device
+qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
 96 leaked clusters were found on the image.
-- 
2.18.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v4 2/5] block: introduce aio task pool
  2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache Vladimir Sementsov-Ogievskiy
@ 2019-08-16 15:30 ` Vladimir Sementsov-Ogievskiy
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-16 15:30 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den

Common interface for aio task loops. To be used for improving
performance of synchronous io loops in qcow2, block-stream,
copy-on-read, and may be other places.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/block/aio_task.h |  54 +++++++++++++++++
 block/aio_task.c         | 124 +++++++++++++++++++++++++++++++++++++++
 block/Makefile.objs      |   2 +
 3 files changed, 180 insertions(+)
 create mode 100644 include/block/aio_task.h
 create mode 100644 block/aio_task.c

diff --git a/include/block/aio_task.h b/include/block/aio_task.h
new file mode 100644
index 0000000000..50bc1e1817
--- /dev/null
+++ b/include/block/aio_task.h
@@ -0,0 +1,54 @@
+/*
+ * Aio tasks loops
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_AIO_TASK_H
+#define BLOCK_AIO_TASK_H
+
+#include "qemu/coroutine.h"
+
+typedef struct AioTaskPool AioTaskPool;
+typedef struct AioTask AioTask;
+typedef int coroutine_fn (*AioTaskFunc)(AioTask *task);
+struct AioTask {
+    AioTaskPool *pool;
+    AioTaskFunc func;
+    int ret;
+};
+
+AioTaskPool *coroutine_fn aio_task_pool_new(int max_busy_tasks);
+void aio_task_pool_free(AioTaskPool *);
+
+/* error code of failed task or 0 if all is OK */
+int aio_task_pool_status(AioTaskPool *pool);
+
+bool aio_task_pool_empty(AioTaskPool *pool);
+
+/* User provides filled @task, however task->pool will be set automatically */
+void coroutine_fn aio_task_pool_start_task(AioTaskPool *pool, AioTask *task);
+
+void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool);
+void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool);
+void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool);
+
+#endif /* BLOCK_AIO_TASK_H */
diff --git a/block/aio_task.c b/block/aio_task.c
new file mode 100644
index 0000000000..88989fa248
--- /dev/null
+++ b/block/aio_task.c
@@ -0,0 +1,124 @@
+/*
+ * Aio tasks loops
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "block/aio_task.h"
+
+struct AioTaskPool {
+    Coroutine *main_co;
+    int status;
+    int max_busy_tasks;
+    int busy_tasks;
+    bool waiting;
+};
+
+static void coroutine_fn aio_task_co(void *opaque)
+{
+    AioTask *task = opaque;
+    AioTaskPool *pool = task->pool;
+
+    assert(pool->busy_tasks < pool->max_busy_tasks);
+    pool->busy_tasks++;
+
+    task->ret = task->func(task);
+
+    pool->busy_tasks--;
+
+    if (task->ret < 0 && pool->status == 0) {
+        pool->status = task->ret;
+    }
+
+    g_free(task);
+
+    if (pool->waiting) {
+        pool->waiting = false;
+        aio_co_wake(pool->main_co);
+    }
+}
+
+void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool)
+{
+    assert(pool->busy_tasks > 0);
+    assert(qemu_coroutine_self() == pool->main_co);
+
+    pool->waiting = true;
+    qemu_coroutine_yield();
+
+    assert(!pool->waiting);
+    assert(pool->busy_tasks < pool->max_busy_tasks);
+}
+
+void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool)
+{
+    if (pool->busy_tasks < pool->max_busy_tasks) {
+        return;
+    }
+
+    aio_task_pool_wait_one(pool);
+}
+
+void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool)
+{
+    while (pool->busy_tasks > 0) {
+        aio_task_pool_wait_one(pool);
+    }
+}
+
+void coroutine_fn aio_task_pool_start_task(AioTaskPool *pool, AioTask *task)
+{
+    aio_task_pool_wait_slot(pool);
+
+    task->pool = pool;
+    qemu_coroutine_enter(qemu_coroutine_create(aio_task_co, task));
+}
+
+AioTaskPool *coroutine_fn aio_task_pool_new(int max_busy_tasks)
+{
+    AioTaskPool *pool = g_new0(AioTaskPool, 1);
+
+    pool->main_co = qemu_coroutine_self();
+    pool->max_busy_tasks = max_busy_tasks;
+
+    return pool;
+}
+
+void aio_task_pool_free(AioTaskPool *pool)
+{
+    g_free(pool);
+}
+
+int aio_task_pool_status(AioTaskPool *pool)
+{
+    if (!pool) {
+        return 0; /* Sugar for lazy allocation of aio pool */
+    }
+
+    return pool->status;
+}
+
+bool aio_task_pool_empty(AioTaskPool *pool)
+{
+    return pool->busy_tasks == 0;
+}
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 35f3bca4d9..c2eb8c8769 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -40,6 +40,8 @@ block-obj-y += throttle.o copy-on-read.o
 
 block-obj-y += crypto.o
 
+block-obj-y += aio_task.o
+
 common-obj-y += stream.o
 
 nfs.o-libs         := $(LIBNFS_LIBS)
-- 
2.18.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part
  2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache Vladimir Sementsov-Ogievskiy
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 2/5] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
@ 2019-08-16 15:30 ` Vladimir Sementsov-Ogievskiy
  2019-09-13 10:01   ` Kevin Wolf
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 4/5] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-16 15:30 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den

Further patch will run partial requests of iterations of
qcow2_co_preadv in parallel for performance reasons. To prepare for
this, separate part which may be parallelized into separate function
(qcow2_co_preadv_task).

While being here, also separate encrypted clusters reading to own
function, like it is done for compressed reading.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json |   2 +-
 block/qcow2.c        | 205 +++++++++++++++++++++++--------------------
 2 files changed, 111 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0d43d4f37c..dd80aa11db 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3266,7 +3266,7 @@
             'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
             'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
             'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-            'cor_write', 'cluster_alloc_space', 'none'] }
+            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
 
 ##
 # @BlkdebugIOType:
diff --git a/block/qcow2.c b/block/qcow2.c
index 93ab7edcea..89afb4272e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1967,17 +1967,114 @@ out:
     return ret;
 }
 
+static coroutine_fn int
+qcow2_co_preadv_encrypted(BlockDriverState *bs,
+                           uint64_t file_cluster_offset,
+                           uint64_t offset,
+                           uint64_t bytes,
+                           QEMUIOVector *qiov,
+                           uint64_t qiov_offset)
+{
+    int ret;
+    BDRVQcow2State *s = bs->opaque;
+    uint8_t *buf;
+
+    assert(bs->encrypted && s->crypto);
+    assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+
+    /*
+     * For encrypted images, read everything into a temporary
+     * contiguous buffer on which the AES functions can work.
+     * Also, decryption in a separate buffer is better as it
+     * prevents the guest from learning information about the
+     * encrypted nature of the virtual disk.
+     */
+
+    buf = qemu_try_blockalign(s->data_file->bs, bytes);
+    if (buf == NULL) {
+        return -ENOMEM;
+    }
+
+    BLKDBG_EVENT(bs->file, BLKDBG_READ_ENCRYPTED);
+    ret = bdrv_co_pread(s->data_file,
+                        file_cluster_offset + offset_into_cluster(s, offset),
+                        bytes, buf, 0);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+    assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+    if (qcow2_co_decrypt(bs, file_cluster_offset, offset, buf, bytes) < 0) {
+        ret = -EIO;
+        goto fail;
+    }
+    qemu_iovec_from_buf(qiov, qiov_offset, buf, bytes);
+
+fail:
+    qemu_vfree(buf);
+
+    return ret;
+}
+
+static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
+                                             QCow2ClusterType cluster_type,
+                                             uint64_t file_cluster_offset,
+                                             uint64_t offset, uint64_t bytes,
+                                             QEMUIOVector *qiov,
+                                             size_t qiov_offset)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int offset_in_cluster = offset_into_cluster(s, offset);
+
+    switch (cluster_type) {
+    case QCOW2_CLUSTER_ZERO_PLAIN:
+    case QCOW2_CLUSTER_ZERO_ALLOC:
+        /* Both zero types are handled in qcow2_co_preadv_part */
+        g_assert_not_reached();
+
+    case QCOW2_CLUSTER_UNALLOCATED:
+        assert(bs->backing); /* otherwise handled in qcow2_co_preadv_part */
+
+        BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
+        return bdrv_co_preadv_part(bs->backing, offset, bytes,
+                                   qiov, qiov_offset, 0);
+
+    case QCOW2_CLUSTER_COMPRESSED:
+        return qcow2_co_preadv_compressed(bs, file_cluster_offset,
+                                          offset, bytes, qiov, qiov_offset);
+
+    case QCOW2_CLUSTER_NORMAL:
+        if ((file_cluster_offset & 511) != 0) {
+            return -EIO;
+        }
+
+        if (bs->encrypted) {
+            return qcow2_co_preadv_encrypted(bs, file_cluster_offset,
+                                             offset, bytes, qiov, qiov_offset);
+        }
+
+        BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+        return bdrv_co_preadv_part(s->data_file,
+                                   file_cluster_offset + offset_in_cluster,
+                                   bytes, qiov, qiov_offset, 0);
+
+    default:
+        g_assert_not_reached();
+    }
+
+    g_assert_not_reached();
+}
+
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
                                              size_t qiov_offset, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
-    int offset_in_cluster;
     int ret;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     uint64_t cluster_offset = 0;
-    uint8_t *cluster_data = NULL;
 
     while (bytes != 0) {
 
@@ -1992,111 +2089,29 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
-            goto fail;
+            return ret;
         }
 
-        offset_in_cluster = offset_into_cluster(s, offset);
-
-        switch (ret) {
-        case QCOW2_CLUSTER_UNALLOCATED:
-
-            if (bs->backing) {
-                BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
-                ret = bdrv_co_preadv_part(bs->backing, offset, cur_bytes,
-                                          qiov, qiov_offset, 0);
-                if (ret < 0) {
-                    goto fail;
-                }
-            } else {
-                /* Note: in this case, no need to wait */
-                qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
-            }
-            break;
-
-        case QCOW2_CLUSTER_ZERO_PLAIN:
-        case QCOW2_CLUSTER_ZERO_ALLOC:
+        if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
+            ret == QCOW2_CLUSTER_ZERO_ALLOC ||
+            (ret == QCOW2_CLUSTER_UNALLOCATED && !bs->backing))
+        {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
-            break;
-
-        case QCOW2_CLUSTER_COMPRESSED:
-            ret = qcow2_co_preadv_compressed(bs, cluster_offset,
-                                             offset, cur_bytes,
-                                             qiov, qiov_offset);
+        } else {
+            ret = qcow2_co_preadv_task(bs, ret,
+                                       cluster_offset, offset, cur_bytes,
+                                       qiov, qiov_offset);
             if (ret < 0) {
-                goto fail;
-            }
-
-            break;
-
-        case QCOW2_CLUSTER_NORMAL:
-            if ((cluster_offset & 511) != 0) {
-                ret = -EIO;
-                goto fail;
-            }
-
-            if (bs->encrypted) {
-                assert(s->crypto);
-
-                /*
-                 * For encrypted images, read everything into a temporary
-                 * contiguous buffer on which the AES functions can work.
-                 */
-                if (!cluster_data) {
-                    cluster_data =
-                        qemu_try_blockalign(s->data_file->bs,
-                                            QCOW_MAX_CRYPT_CLUSTERS
-                                            * s->cluster_size);
-                    if (cluster_data == NULL) {
-                        ret = -ENOMEM;
-                        goto fail;
-                    }
-                }
-
-                assert(cur_bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
-
-                BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-                ret = bdrv_co_pread(s->data_file,
-                                    cluster_offset + offset_in_cluster,
-                                    cur_bytes, cluster_data, 0);
-                if (ret < 0) {
-                    goto fail;
-                }
-
-                assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
-                assert((cur_bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
-                if (qcow2_co_decrypt(bs, cluster_offset, offset,
-                                     cluster_data, cur_bytes) < 0) {
-                    ret = -EIO;
-                    goto fail;
-                }
-                qemu_iovec_from_buf(qiov, qiov_offset, cluster_data, cur_bytes);
-            } else {
-                BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-                ret = bdrv_co_preadv_part(s->data_file,
-                                          cluster_offset + offset_in_cluster,
-                                          cur_bytes, qiov, qiov_offset, 0);
-                if (ret < 0) {
-                    goto fail;
-                }
+                return ret;
             }
-            break;
-
-        default:
-            g_assert_not_reached();
-            ret = -EIO;
-            goto fail;
         }
 
         bytes -= cur_bytes;
         offset += cur_bytes;
         qiov_offset += cur_bytes;
     }
-    ret = 0;
-
-fail:
-    qemu_vfree(cluster_data);
 
-    return ret;
+    return 0;
 }
 
 /* Check if it's possible to merge a write request with the writing of
-- 
2.18.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v4 4/5] block/qcow2: refactor qcow2_co_pwritev_part
  2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
@ 2019-08-16 15:30 ` Vladimir Sementsov-Ogievskiy
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 5/5] block/qcow2: introduce parallel subrequest handling in read and write Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-16 15:30 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den

Similarly to previous commit, prepare for parallelizing write-loop
iterations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 153 +++++++++++++++++++++++++++++---------------------
 1 file changed, 89 insertions(+), 64 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 89afb4272e..3aaa180e2b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2234,6 +2234,87 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
     return 0;
 }
 
+/*
+ * qcow2_co_pwritev_task
+ * Called with s->lock unlocked
+ * l2meta  - if not NULL, qcow2_co_do_pwritev() will consume it. Caller must not
+ *           use it somehow after qcow2_co_pwritev_task() call
+ */
+static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
+                                              uint64_t file_cluster_offset,
+                                              uint64_t offset, uint64_t bytes,
+                                              QEMUIOVector *qiov,
+                                              uint64_t qiov_offset,
+                                              QCowL2Meta *l2meta)
+{
+    int ret;
+    BDRVQcow2State *s = bs->opaque;
+    void *crypt_buf = NULL;
+    int offset_in_cluster = offset_into_cluster(s, offset);
+    QEMUIOVector encrypted_qiov;
+
+    if (bs->encrypted) {
+        assert(s->crypto);
+        assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+        crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);
+        if (crypt_buf == NULL) {
+            ret = -ENOMEM;
+            goto out_unlocked;
+        }
+        qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes);
+
+        if (qcow2_co_encrypt(bs, file_cluster_offset, offset,
+                             crypt_buf, bytes) < 0) {
+            ret = -EIO;
+            goto out_unlocked;
+        }
+
+        qemu_iovec_init_buf(&encrypted_qiov, crypt_buf, bytes);
+        qiov = &encrypted_qiov;
+        qiov_offset = 0;
+    }
+
+    /* Try to efficiently initialize the physical space with zeroes */
+    ret = handle_alloc_space(bs, l2meta);
+    if (ret < 0) {
+        goto out_unlocked;
+    }
+
+    /*
+     * If we need to do COW, check if it's possible to merge the
+     * writing of the guest data together with that of the COW regions.
+     * If it's not possible (or not necessary) then write the
+     * guest data now.
+     */
+    if (!merge_cow(offset, bytes, qiov, qiov_offset, l2meta)) {
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+        trace_qcow2_writev_data(qemu_coroutine_self(),
+                                file_cluster_offset + offset_in_cluster);
+        ret = bdrv_co_pwritev_part(s->data_file,
+                                   file_cluster_offset + offset_in_cluster,
+                                   bytes, qiov, qiov_offset, 0);
+        if (ret < 0) {
+            goto out_unlocked;
+        }
+    }
+
+    qemu_co_mutex_lock(&s->lock);
+
+    ret = qcow2_handle_l2meta(bs, &l2meta, true);
+    goto out_locked;
+
+out_unlocked:
+    qemu_co_mutex_lock(&s->lock);
+
+out_locked:
+    qcow2_handle_l2meta(bs, &l2meta, false);
+    qemu_co_mutex_unlock(&s->lock);
+
+    qemu_vfree(crypt_buf);
+
+    return ret;
+}
+
 static coroutine_fn int qcow2_co_pwritev_part(
         BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         QEMUIOVector *qiov, size_t qiov_offset, int flags)
@@ -2243,15 +2324,10 @@ static coroutine_fn int qcow2_co_pwritev_part(
     int ret;
     unsigned int cur_bytes; /* number of sectors in current iteration */
     uint64_t cluster_offset;
-    QEMUIOVector encrypted_qiov;
-    uint64_t bytes_done = 0;
-    uint8_t *cluster_data = NULL;
     QCowL2Meta *l2meta = NULL;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
 
-    qemu_co_mutex_lock(&s->lock);
-
     while (bytes != 0) {
 
         l2meta = NULL;
@@ -2265,6 +2341,8 @@ static coroutine_fn int qcow2_co_pwritev_part(
                             - offset_in_cluster);
         }
 
+        qemu_co_mutex_lock(&s->lock);
+
         ret = qcow2_alloc_cluster_offset(bs, offset, &cur_bytes,
                                          &cluster_offset, &l2meta);
         if (ret < 0) {
@@ -2282,73 +2360,20 @@ static coroutine_fn int qcow2_co_pwritev_part(
 
         qemu_co_mutex_unlock(&s->lock);
 
-        if (bs->encrypted) {
-            assert(s->crypto);
-            if (!cluster_data) {
-                cluster_data = qemu_try_blockalign(bs->file->bs,
-                                                   QCOW_MAX_CRYPT_CLUSTERS
-                                                   * s->cluster_size);
-                if (cluster_data == NULL) {
-                    ret = -ENOMEM;
-                    goto out_unlocked;
-                }
-            }
-
-            assert(cur_bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
-            qemu_iovec_to_buf(qiov, qiov_offset + bytes_done,
-                              cluster_data, cur_bytes);
-
-            if (qcow2_co_encrypt(bs, cluster_offset, offset,
-                                 cluster_data, cur_bytes) < 0) {
-                ret = -EIO;
-                goto out_unlocked;
-            }
-
-            qemu_iovec_init_buf(&encrypted_qiov, cluster_data, cur_bytes);
-        }
-
-        /* Try to efficiently initialize the physical space with zeroes */
-        ret = handle_alloc_space(bs, l2meta);
+        ret = qcow2_co_pwritev_task(bs, cluster_offset, offset, cur_bytes,
+                                    qiov, qiov_offset, l2meta);
+        l2meta = NULL; /* l2meta is consumed by qcow2_co_pwritev_task() */
         if (ret < 0) {
-            goto out_unlocked;
-        }
-
-        /* If we need to do COW, check if it's possible to merge the
-         * writing of the guest data together with that of the COW regions.
-         * If it's not possible (or not necessary) then write the
-         * guest data now. */
-        if (!merge_cow(offset, cur_bytes,
-                       bs->encrypted ? &encrypted_qiov : qiov,
-                       bs->encrypted ? 0 : qiov_offset + bytes_done, l2meta))
-        {
-            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-            trace_qcow2_writev_data(qemu_coroutine_self(),
-                                    cluster_offset + offset_in_cluster);
-            ret = bdrv_co_pwritev_part(
-                    s->data_file, cluster_offset + offset_in_cluster, cur_bytes,
-                    bs->encrypted ? &encrypted_qiov : qiov,
-                    bs->encrypted ? 0 : qiov_offset + bytes_done, 0);
-            if (ret < 0) {
-                goto out_unlocked;
-            }
-        }
-
-        qemu_co_mutex_lock(&s->lock);
-
-        ret = qcow2_handle_l2meta(bs, &l2meta, true);
-        if (ret) {
-            goto out_locked;
+            goto fail_nometa;
         }
 
         bytes -= cur_bytes;
         offset += cur_bytes;
-        bytes_done += cur_bytes;
+        qiov_offset += cur_bytes;
         trace_qcow2_writev_done_part(qemu_coroutine_self(), cur_bytes);
     }
     ret = 0;
-    goto out_locked;
 
-out_unlocked:
     qemu_co_mutex_lock(&s->lock);
 
 out_locked:
@@ -2356,7 +2381,7 @@ out_locked:
 
     qemu_co_mutex_unlock(&s->lock);
 
-    qemu_vfree(cluster_data);
+fail_nometa:
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
 
     return ret;
-- 
2.18.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v4 5/5] block/qcow2: introduce parallel subrequest handling in read and write
  2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 4/5] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
@ 2019-08-16 15:30 ` Vladimir Sementsov-Ogievskiy
  2019-09-03 12:16 ` [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
  2019-09-13  8:58 ` Max Reitz
  6 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-08-16 15:30 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, vsementsov, armbru, qemu-devel, mreitz, stefanha, den

It improves performance for fragmented qcow2 images. It also affect 026
iotest, increasing leaked clusters number, which is not surprising when
we run several sub-requests of qcow2 request in parallel.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h                      |   3 +
 block/qcow2.c                      | 125 ++++++++++++++++++++++++++---
 block/trace-events                 |   1 +
 tests/qemu-iotests/026.out         |  18 +++--
 tests/qemu-iotests/026.out.nocache |  20 ++---
 5 files changed, 138 insertions(+), 29 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 998bcdaef1..fdfa9c31cd 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -65,6 +65,9 @@
 #define QCOW2_MAX_BITMAPS 65535
 #define QCOW2_MAX_BITMAP_DIRECTORY_SIZE (1024 * QCOW2_MAX_BITMAPS)
 
+/* Maximum of parallel sub-request per guest request */
+#define QCOW2_MAX_WORKERS 8
+
 /* indicate that the refcount of the referenced cluster is exactly one. */
 #define QCOW_OFLAG_COPIED     (1ULL << 63)
 /* indicate that the cluster is compressed (they never have the copied flag) */
diff --git a/block/qcow2.c b/block/qcow2.c
index 3aaa180e2b..36b41e8536 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -40,6 +40,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "crypto.h"
+#include "block/aio_task.h"
 
 /*
   Differences with QCOW:
@@ -2017,6 +2018,60 @@ fail:
     return ret;
 }
 
+typedef struct Qcow2AioTask {
+    AioTask task;
+
+    BlockDriverState *bs;
+    QCow2ClusterType cluster_type; /* only for read */
+    uint64_t file_cluster_offset;
+    uint64_t offset;
+    uint64_t bytes;
+    QEMUIOVector *qiov;
+    uint64_t qiov_offset;
+    QCowL2Meta *l2meta; /* only for write */
+} Qcow2AioTask;
+
+static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task);
+static coroutine_fn int qcow2_add_task(BlockDriverState *bs,
+                                       AioTaskPool *pool,
+                                       AioTaskFunc func,
+                                       QCow2ClusterType cluster_type,
+                                       uint64_t file_cluster_offset,
+                                       uint64_t offset,
+                                       uint64_t bytes,
+                                       QEMUIOVector *qiov,
+                                       size_t qiov_offset,
+                                       QCowL2Meta *l2meta)
+{
+    Qcow2AioTask local_task;
+    Qcow2AioTask *task = pool ? g_new(Qcow2AioTask, 1) : &local_task;
+
+    *task = (Qcow2AioTask) {
+        .task.func = func,
+        .bs = bs,
+        .cluster_type = cluster_type,
+        .qiov = qiov,
+        .file_cluster_offset = file_cluster_offset,
+        .offset = offset,
+        .bytes = bytes,
+        .qiov_offset = qiov_offset,
+        .l2meta = l2meta,
+    };
+
+    trace_qcow2_add_task(qemu_coroutine_self(), bs, pool,
+                         func == qcow2_co_preadv_task_entry ? "read" : "write",
+                         cluster_type, file_cluster_offset, offset, bytes,
+                         qiov, qiov_offset);
+
+    if (!pool) {
+        return func(&task->task);
+    }
+
+    aio_task_pool_start_task(pool, &task->task);
+
+    return 0;
+}
+
 static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
                                              QCow2ClusterType cluster_type,
                                              uint64_t file_cluster_offset,
@@ -2066,18 +2121,28 @@ static coroutine_fn int qcow2_co_preadv_task(BlockDriverState *bs,
     g_assert_not_reached();
 }
 
+static coroutine_fn int qcow2_co_preadv_task_entry(AioTask *task)
+{
+    Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
+
+    assert(!t->l2meta);
+
+    return qcow2_co_preadv_task(t->bs, t->cluster_type, t->file_cluster_offset,
+                                t->offset, t->bytes, t->qiov, t->qiov_offset);
+}
+
 static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
                                              uint64_t offset, uint64_t bytes,
                                              QEMUIOVector *qiov,
                                              size_t qiov_offset, int flags)
 {
     BDRVQcow2State *s = bs->opaque;
-    int ret;
+    int ret = 0;
     unsigned int cur_bytes; /* number of bytes in current iteration */
     uint64_t cluster_offset = 0;
+    AioTaskPool *aio = NULL;
 
-    while (bytes != 0) {
-
+    while (bytes != 0 && aio_task_pool_status(aio) == 0) {
         /* prepare next request */
         cur_bytes = MIN(bytes, INT_MAX);
         if (s->crypto) {
@@ -2089,7 +2154,7 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         ret = qcow2_get_cluster_offset(bs, offset, &cur_bytes, &cluster_offset);
         qemu_co_mutex_unlock(&s->lock);
         if (ret < 0) {
-            return ret;
+            goto out;
         }
 
         if (ret == QCOW2_CLUSTER_ZERO_PLAIN ||
@@ -2098,11 +2163,14 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         {
             qemu_iovec_memset(qiov, qiov_offset, 0, cur_bytes);
         } else {
-            ret = qcow2_co_preadv_task(bs, ret,
-                                       cluster_offset, offset, cur_bytes,
-                                       qiov, qiov_offset);
+            if (!aio && cur_bytes != bytes) {
+                aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
+            }
+            ret = qcow2_add_task(bs, aio, qcow2_co_preadv_task_entry, ret,
+                                 cluster_offset, offset, cur_bytes,
+                                 qiov, qiov_offset, NULL);
             if (ret < 0) {
-                return ret;
+                goto out;
             }
         }
 
@@ -2111,7 +2179,16 @@ static coroutine_fn int qcow2_co_preadv_part(BlockDriverState *bs,
         qiov_offset += cur_bytes;
     }
 
-    return 0;
+out:
+    if (aio) {
+        aio_task_pool_wait_all(aio);
+        if (ret == 0) {
+            ret = aio_task_pool_status(aio);
+        }
+        g_free(aio);
+    }
+
+    return ret;
 }
 
 /* Check if it's possible to merge a write request with the writing of
@@ -2315,6 +2392,17 @@ out_locked:
     return ret;
 }
 
+static coroutine_fn int qcow2_co_pwritev_task_entry(AioTask *task)
+{
+    Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
+
+    assert(!t->cluster_type);
+
+    return qcow2_co_pwritev_task(t->bs, t->file_cluster_offset,
+                                 t->offset, t->bytes, t->qiov, t->qiov_offset,
+                                 t->l2meta);
+}
+
 static coroutine_fn int qcow2_co_pwritev_part(
         BlockDriverState *bs, uint64_t offset, uint64_t bytes,
         QEMUIOVector *qiov, size_t qiov_offset, int flags)
@@ -2325,10 +2413,11 @@ static coroutine_fn int qcow2_co_pwritev_part(
     unsigned int cur_bytes; /* number of sectors in current iteration */
     uint64_t cluster_offset;
     QCowL2Meta *l2meta = NULL;
+    AioTaskPool *aio = NULL;
 
     trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
 
-    while (bytes != 0) {
+    while (bytes != 0 && aio_task_pool_status(aio) == 0) {
 
         l2meta = NULL;
 
@@ -2360,8 +2449,12 @@ static coroutine_fn int qcow2_co_pwritev_part(
 
         qemu_co_mutex_unlock(&s->lock);
 
-        ret = qcow2_co_pwritev_task(bs, cluster_offset, offset, cur_bytes,
-                                    qiov, qiov_offset, l2meta);
+        if (!aio && cur_bytes != bytes) {
+            aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
+        }
+        ret = qcow2_add_task(bs, aio, qcow2_co_pwritev_task_entry, 0,
+                             cluster_offset, offset, cur_bytes,
+                             qiov, qiov_offset, l2meta);
         l2meta = NULL; /* l2meta is consumed by qcow2_co_pwritev_task() */
         if (ret < 0) {
             goto fail_nometa;
@@ -2382,6 +2475,14 @@ out_locked:
     qemu_co_mutex_unlock(&s->lock);
 
 fail_nometa:
+    if (aio) {
+        aio_task_pool_wait_all(aio);
+        if (ret == 0) {
+            ret = aio_task_pool_status(aio);
+        }
+        g_free(aio);
+    }
+
     trace_qcow2_writev_done_req(qemu_coroutine_self(), ret);
 
     return ret;
diff --git a/block/trace-events b/block/trace-events
index d724df0117..7f51550ba3 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -61,6 +61,7 @@ file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "
 file_copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64
 
 # qcow2.c
+qcow2_add_task(void *co, void *bs, void *pool, const char *action, int cluster_type, uint64_t file_cluster_offset, uint64_t offset, uint64_t bytes, void *qiov, size_t qiov_offset) "co %p bs %p pool %p: %s: cluster_type %d file_cluster_offset %" PRIu64 " offset %" PRIu64 " bytes %" PRIu64 " qiov %p qiov_offset %zu"
 qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
 qcow2_writev_done_req(void *co, int ret) "co %p ret %d"
 qcow2_writev_start_part(void *co) "co %p"
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index fb89b8480c..4849c9c90a 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -481,7 +481,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-55 leaked clusters were found on the image.
+119 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -508,7 +508,9 @@ Event: refblock_alloc_write; errno: 28; imm: off; once: off; write
 qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
-No errors were found on the image.
+
+64 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write -b
@@ -533,7 +535,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-10 leaked clusters were found on the image.
+74 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -542,7 +544,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-23 leaked clusters were found on the image.
+87 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -561,7 +563,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-10 leaked clusters were found on the image.
+74 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -570,7 +572,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-23 leaked clusters were found on the image.
+87 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -589,7 +591,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-10 leaked clusters were found on the image.
+74 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
@@ -598,7 +600,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-23 leaked clusters were found on the image.
+87 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 
 === L1 growth tests ===
diff --git a/tests/qemu-iotests/026.out.nocache b/tests/qemu-iotests/026.out.nocache
index 6dda95dfb4..6b56df7788 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -489,7 +489,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-55 leaked clusters were found on the image.
+119 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -516,8 +516,10 @@ Event: refblock_alloc_write; errno: 28; imm: off; once: off; write
 qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
-No errors were found on the image.
-Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
+
+64 leaked clusters were found on the image.
+This means waste of disk space, but no harm to data.
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 
 Event: refblock_alloc_write; errno: 28; imm: off; once: off; write -b
 qemu-io: Failed to flush the L2 table cache: No space left on device
@@ -541,7 +543,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-10 leaked clusters were found on the image.
+74 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -550,7 +552,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-23 leaked clusters were found on the image.
+87 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -569,7 +571,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-10 leaked clusters were found on the image.
+74 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -578,7 +580,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-23 leaked clusters were found on the image.
+87 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -597,7 +599,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-10 leaked clusters were found on the image.
+74 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
 
@@ -606,7 +608,7 @@ qemu-io: Failed to flush the L2 table cache: No space left on device
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device
 
-23 leaked clusters were found on the image.
+87 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 
 === L1 growth tests ===
-- 
2.18.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io
  2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 5/5] block/qcow2: introduce parallel subrequest handling in read and write Vladimir Sementsov-Ogievskiy
@ 2019-09-03 12:16 ` Vladimir Sementsov-Ogievskiy
  2019-09-13  8:58 ` Max Reitz
  6 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-03 12:16 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, Denis Lunev, armbru, qemu-devel, mreitz, stefanha

Pinging, as Stefan's branch merged into master and now these series based on master.

16.08.2019 18:30, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is an asynchronous scheme for handling fragmented qcow2
> reads and writes. Both qcow2 read and write functions loops through
> sequential portions of data. The series aim it to parallelize these
> loops iterations.
> It improves performance for fragmented qcow2 images, I've tested it
> as described below.
> 
> v4 [perf results not updated]:
> 01: new patch. Unrelated, but need to fix 026 before the series to
>      correctly fix it after :)
> 02: - use coroutine_fn where appropriate (i.e. in aio_task_pool_new too)
>      - add Max's r-b
> 03,04: add Max's r-b
> 05: fix 026 output
> 
> v3 (by Max's comments) [perf results not updated]:
> 
> 01: - use coroutine_fn where appropriate !!!!!!!!!!!!!!!!!!!!!!!
>      - add aio_task_pool_free
>      - add some comments
>      - move header to include/block
>      - s/wait_done/waiting
> 02: - Rewrite note about decryption in guest buffers [thx to Eric]
>      - separate g_assert_not_reached for QCOW2_CLUSTER_ZERO_*
>      - drop return after g_assert_not_reached
> 03: - drop bytes_done and correctly use qiov_offset
>      - fix comment
> 04: - move QCOW2_MAX_WORKERS to block/qcow2.h
>      - initialize ret in qcow2_co_preadv_part
> Based-on: https://github.com/stefanha/qemu/commits/block
> 
> 
> v2: changed a lot, as
>   1. a lot of preparations around locks, hd_qiovs, threads for encryption
>      are done
>   2. I decided to create separate file with async request handling API, to
>      reuse it for backup, stream and copy-on-read to improve their performance
>      too. Mirror and qemu-img convert has their own async request handling,
>      may be we'll be able finally merge all these similar code into one
>      feature.
>      Note that not all API calls used in qcow2, some will be needed on
>      following steps for parallelizing other io loops.
> 
> About testing:
> 
> I have four 4G qcow2 images (with default 64k block size) on my ssd disk:
> t-seq.qcow2 - sequentially written qcow2 image
> t-reverse.qcow2 - filled by writing 64k portions from end to the start
> t-rand.qcow2 - filled by writing 64k portions (aligned) in random order
> t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m clusters
> (see source code of image generation in the end for details)
> 
> and I've done several runs like the following (sequential io by 1mb chunks):
> 
>      out=/tmp/block; echo > $out; cat /tmp/files | while read file; do for wr in {"","-w"}; do echo "$file" $wr; ./qemu-img bench -c 4096 -d 1 -f qcow2 -n -s 1m -t none $wr "$file" | grep 'Run completed in' | awk '{print $4}' >> $out; done; done
> 
> 
> short info about parameters:
>    -w - do writes (otherwise do reads)
>    -c - count of blocks
>    -s - block size
>    -t none - disable cache
>    -n - native aio
>    -d 1 - don't use parallel requests provided by qemu-img bench itself
> 
> results:
> 
>      +---------------------------+---------+---------+
>      | file                      | master  | async   |
>      +---------------------------+---------+---------+
>      | /ssd/t-part-rand.qcow2    | 14.671  | 9.193   |
>      +---------------------------+---------+---------+
>      | /ssd/t-part-rand.qcow2 -w | 11.434  | 8.621   |
>      +---------------------------+---------+---------+
>      | /ssd/t-rand.qcow2         | 20.421  | 10.05   |
>      +---------------------------+---------+---------+
>      | /ssd/t-rand.qcow2 -w      | 11.097  | 8.915   |
>      +---------------------------+---------+---------+
>      | /ssd/t-reverse.qcow2      | 17.515  | 9.407   |
>      +---------------------------+---------+---------+
>      | /ssd/t-reverse.qcow2 -w   | 11.255  | 8.649   |
>      +---------------------------+---------+---------+
>      | /ssd/t-seq.qcow2          | 9.081   | 9.072   |
>      +---------------------------+---------+---------+
>      | /ssd/t-seq.qcow2 -w       | 8.761   | 8.747   |
>      +---------------------------+---------+---------+
>      | /tmp/t-part-rand.qcow2    | 41.179  | 41.37   |
>      +---------------------------+---------+---------+
>      | /tmp/t-part-rand.qcow2 -w | 54.097  | 55.323  |
>      +---------------------------+---------+---------+
>      | /tmp/t-rand.qcow2         | 711.899 | 514.339 |
>      +---------------------------+---------+---------+
>      | /tmp/t-rand.qcow2 -w      | 546.259 | 642.114 |
>      +---------------------------+---------+---------+
>      | /tmp/t-reverse.qcow2      | 86.065  | 96.522  |
>      +---------------------------+---------+---------+
>      | /tmp/t-reverse.qcow2 -w   | 46.557  | 48.499  |
>      +---------------------------+---------+---------+
>      | /tmp/t-seq.qcow2          | 33.804  | 33.862  |
>      +---------------------------+---------+---------+
>      | /tmp/t-seq.qcow2 -w       | 34.299  | 34.233  |
>      +---------------------------+---------+---------+
> 
> 
> Performance gain is obvious, especially for read and especially for ssd.
> For hdd there is a degradation for reverse case, but this is the most
> impossible case and seems not critical.
> 
> How images are generated:
> 
>      ==== gen-writes ======
>      #!/usr/bin/env python
>      import random
>      import sys
> 
>      size = 4 * 1024 * 1024 * 1024
>      block = 64 * 1024
>      block2 = 1024 * 1024
> 
>      arg = sys.argv[1]
> 
>      if arg in ('rand', 'reverse', 'seq'):
>          writes = list(range(0, size, block))
> 
>      if arg == 'rand':
>          random.shuffle(writes)
>      elif arg == 'reverse':
>          writes.reverse()
>      elif arg == 'part-rand':
>          writes = []
>          for off in range(0, size, block2):
>              wr = list(range(off, off + block2, block))
>              random.shuffle(wr)
>              writes.extend(wr)
>      elif arg != 'seq':
>          sys.exit(1)
> 
>      for w in writes:
>          print 'write -P 0xff {} {}'.format(w, block)
> 
>      print 'q'
>      ==========================
> 
>      ===== gen-test-images.sh =====
>      #!/bin/bash
> 
>      IMG_PATH=/ssd
> 
>      for name in seq reverse rand part-rand; do
>          IMG=$IMG_PATH/t-$name.qcow2
>          echo createing $IMG ...
>          rm -f $IMG
>          qemu-img create -f qcow2 $IMG 4G
>          gen-writes $name | qemu-io $IMG
>      done
>      ==============================
> 
> 
> Vladimir Sementsov-Ogievskiy (5):
>    tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache
>    block: introduce aio task pool
>    block/qcow2: refactor qcow2_co_preadv_part
>    block/qcow2: refactor qcow2_co_pwritev_part
>    block/qcow2: introduce parallel subrequest handling in read and write
> 
>   qapi/block-core.json               |   2 +-
>   block/qcow2.h                      |   3 +
>   include/block/aio_task.h           |  54 ++++
>   block/aio_task.c                   | 124 ++++++++
>   block/qcow2.c                      | 461 +++++++++++++++++++----------
>   block/Makefile.objs                |   2 +
>   block/trace-events                 |   1 +
>   tests/qemu-iotests/026.out         |  18 +-
>   tests/qemu-iotests/026.out.nocache | 188 ++++++------
>   9 files changed, 591 insertions(+), 262 deletions(-)
>   create mode 100644 include/block/aio_task.h
>   create mode 100644 block/aio_task.c
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache Vladimir Sementsov-Ogievskiy
@ 2019-09-09 21:58   ` " John Snow
  0 siblings, 0 replies; 17+ messages in thread
From: John Snow @ 2019-09-09 21:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, qemu-devel, armbru, mreitz, stefanha, den



On 8/16/19 11:30 AM, Vladimir Sementsov-Ogievskiy wrote:
> qemu-io now prefixes its error and warnings with "qemu-io:".
> 36b9986b08787019e fixed a lot of iotests output but forget about
> 026.out.nocache. Fix it too.
> 
> Fixes: 99e98d7c9fc1a1639fad ("qemu-io: Use error_[gs]et_progname()")
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/026.out.nocache | 168 ++++++++++++++---------------
>  1 file changed, 84 insertions(+), 84 deletions(-)
> 

Thanks!

Reviewed-by: John Snow <jsnow@redhat.com>

--js


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io
  2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2019-09-03 12:16 ` [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
@ 2019-09-13  8:58 ` Max Reitz
  2019-09-13  9:11   ` Vladimir Sementsov-Ogievskiy
  2019-09-16 13:26   ` Max Reitz
  6 siblings, 2 replies; 17+ messages in thread
From: Max Reitz @ 2019-09-13  8:58 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, qemu-devel, armbru, stefanha, den

[-- Attachment #1.1: Type: text/plain, Size: 845 bytes --]

On 16.08.19 17:30, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is an asynchronous scheme for handling fragmented qcow2
> reads and writes. Both qcow2 read and write functions loops through
> sequential portions of data. The series aim it to parallelize these
> loops iterations.
> It improves performance for fragmented qcow2 images, I've tested it
> as described below.

Thanks, I’ve changed two things:
- Replaced assert((x & (BDRV_SECTOR_SIZE - 1)) == 0) by
  assert(QEMU_IS_ALIGNED(x, BDRV_SECTOR_SIZE)) in patch 3 (conflict with
  “block: Use QEMU_IS_ALIGNED”), and
- Replaced the remaining instance of “qcow2_co_do_pwritev()” by
  “qcow2_co_pwritev_task()” in a comment in patch 4

and applied the series to my block branch:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io
  2019-09-13  8:58 ` Max Reitz
@ 2019-09-13  9:11   ` Vladimir Sementsov-Ogievskiy
  2019-09-16 13:26   ` Max Reitz
  1 sibling, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-13  9:11 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: kwolf, Denis Lunev, qemu-devel, armbru, stefanha

13.09.2019 11:58, Max Reitz wrote:
> On 16.08.19 17:30, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> Here is an asynchronous scheme for handling fragmented qcow2
>> reads and writes. Both qcow2 read and write functions loops through
>> sequential portions of data. The series aim it to parallelize these
>> loops iterations.
>> It improves performance for fragmented qcow2 images, I've tested it
>> as described below.
> 
> Thanks, I’ve changed two things:
> - Replaced assert((x & (BDRV_SECTOR_SIZE - 1)) == 0) by
>    assert(QEMU_IS_ALIGNED(x, BDRV_SECTOR_SIZE)) in patch 3 (conflict with
>    “block: Use QEMU_IS_ALIGNED”), and
> - Replaced the remaining instance of “qcow2_co_do_pwritev()” by
>    “qcow2_co_pwritev_task()” in a comment in patch 4
> 
> and applied the series to my block branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block
> 
> Max
> 

Thank you!!!

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part
  2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
@ 2019-09-13 10:01   ` Kevin Wolf
  2019-09-13 10:53     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2019-09-13 10:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-block, armbru, qemu-devel, mreitz, stefanha, den

Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Further patch will run partial requests of iterations of
> qcow2_co_preadv in parallel for performance reasons. To prepare for
> this, separate part which may be parallelized into separate function
> (qcow2_co_preadv_task).
> 
> While being here, also separate encrypted clusters reading to own
> function, like it is done for compressed reading.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> ---
>  qapi/block-core.json |   2 +-
>  block/qcow2.c        | 205 +++++++++++++++++++++++--------------------
>  2 files changed, 111 insertions(+), 96 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0d43d4f37c..dd80aa11db 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3266,7 +3266,7 @@
>              'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>              'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>              'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
> -            'cor_write', 'cluster_alloc_space', 'none'] }
> +            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }

What's the point of this new blkdebug event?

Obviously, read_aio for an encrypted image must mean a read of encrypted
data. The same image can never trigger both read_aio and
read_encrypted, so why do we need to distinguish them as two different
events?

Kevin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part
  2019-09-13 10:01   ` Kevin Wolf
@ 2019-09-13 10:53     ` Vladimir Sementsov-Ogievskiy
  2019-09-13 11:06       ` Max Reitz
  0 siblings, 1 reply; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-13 10:53 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Denis Lunev, qemu-block, armbru, qemu-devel, mreitz, stefanha

13.09.2019 13:01, Kevin Wolf wrote:
> Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Further patch will run partial requests of iterations of
>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>> this, separate part which may be parallelized into separate function
>> (qcow2_co_preadv_task).
>>
>> While being here, also separate encrypted clusters reading to own
>> function, like it is done for compressed reading.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   qapi/block-core.json |   2 +-
>>   block/qcow2.c        | 205 +++++++++++++++++++++++--------------------
>>   2 files changed, 111 insertions(+), 96 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 0d43d4f37c..dd80aa11db 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3266,7 +3266,7 @@
>>               'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>>               'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>>               'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
>> -            'cor_write', 'cluster_alloc_space', 'none'] }
>> +            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
> 
> What's the point of this new blkdebug event?
> 
> Obviously, read_aio for an encrypted image must mean a read of encrypted
> data. The same image can never trigger both read_aio and
> read_encrypted, so why do we need to distinguish them as two different
> events?
> 

Seems I just done it looking at qcow2_co_preadv_compressed..

Anyway, I think you are right, so, I don't mind if Max drops this new event
and use read_aio in his branch, or I can resend the series or send a follow-up,
whichever you prefer.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part
  2019-09-13 10:53     ` Vladimir Sementsov-Ogievskiy
@ 2019-09-13 11:06       ` Max Reitz
  2019-09-13 11:34         ` Kevin Wolf
  0 siblings, 1 reply; 17+ messages in thread
From: Max Reitz @ 2019-09-13 11:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
  Cc: Denis Lunev, qemu-block, qemu-devel, armbru, stefanha

[-- Attachment #1.1: Type: text/plain, Size: 3037 bytes --]

On 13.09.19 12:53, Vladimir Sementsov-Ogievskiy wrote:
> 13.09.2019 13:01, Kevin Wolf wrote:
>> Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> Further patch will run partial requests of iterations of
>>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>>> this, separate part which may be parallelized into separate function
>>> (qcow2_co_preadv_task).
>>>
>>> While being here, also separate encrypted clusters reading to own
>>> function, like it is done for compressed reading.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   qapi/block-core.json |   2 +-
>>>   block/qcow2.c        | 205 +++++++++++++++++++++++--------------------
>>>   2 files changed, 111 insertions(+), 96 deletions(-)
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 0d43d4f37c..dd80aa11db 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -3266,7 +3266,7 @@
>>>               'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>>>               'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>>>               'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
>>> -            'cor_write', 'cluster_alloc_space', 'none'] }
>>> +            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
>>
>> What's the point of this new blkdebug event?
>>
>> Obviously, read_aio for an encrypted image must mean a read of encrypted
>> data. The same image can never trigger both read_aio and
>> read_encrypted, so why do we need to distinguish them as two different
>> events?
>>
> 
> Seems I just done it looking at qcow2_co_preadv_compressed..
> 
> Anyway, I think you are right, so, I don't mind if Max drops this new event
> and use read_aio in his branch, or I can resend the series or send a follow-up,
> whichever you prefer.

Should I squash this in?

diff --git a/qapi/block-core.json b/qapi/block-core.json
index d9ae73a43c..e6edd641f1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3264,7 +3264,7 @@
             'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
             'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
             'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
-            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
+            'cor_write', 'cluster_alloc_space', 'none'] }

 ##
 # @BlkdebugIOType:
diff --git a/block/qcow2.c b/block/qcow2.c
index b5fe014b20..c07ce84d54 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2001,7 +2001,7 @@ qcow2_co_preadv_encrypted(BlockDriverState *bs,
         return -ENOMEM;
     }

-    BLKDBG_EVENT(bs->file, BLKDBG_READ_ENCRYPTED);
+    BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
     ret = bdrv_co_pread(s->data_file,
                         file_cluster_offset + offset_into_cluster(s,
offset),
                         bytes, buf, 0);


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part
  2019-09-13 11:06       ` Max Reitz
@ 2019-09-13 11:34         ` Kevin Wolf
  2019-09-13 11:55           ` Max Reitz
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2019-09-13 11:34 UTC (permalink / raw)
  To: Max Reitz
  Cc: Vladimir Sementsov-Ogievskiy, Denis Lunev, qemu-block,
	qemu-devel, armbru, stefanha

[-- Attachment #1: Type: text/plain, Size: 2153 bytes --]

Am 13.09.2019 um 13:06 hat Max Reitz geschrieben:
> On 13.09.19 12:53, Vladimir Sementsov-Ogievskiy wrote:
> > 13.09.2019 13:01, Kevin Wolf wrote:
> >> Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>> Further patch will run partial requests of iterations of
> >>> qcow2_co_preadv in parallel for performance reasons. To prepare for
> >>> this, separate part which may be parallelized into separate function
> >>> (qcow2_co_preadv_task).
> >>>
> >>> While being here, also separate encrypted clusters reading to own
> >>> function, like it is done for compressed reading.
> >>>
> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>> Reviewed-by: Max Reitz <mreitz@redhat.com>
> >>> ---
> >>>   qapi/block-core.json |   2 +-
> >>>   block/qcow2.c        | 205 +++++++++++++++++++++++--------------------
> >>>   2 files changed, 111 insertions(+), 96 deletions(-)
> >>>
> >>> diff --git a/qapi/block-core.json b/qapi/block-core.json
> >>> index 0d43d4f37c..dd80aa11db 100644
> >>> --- a/qapi/block-core.json
> >>> +++ b/qapi/block-core.json
> >>> @@ -3266,7 +3266,7 @@
> >>>               'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
> >>>               'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
> >>>               'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
> >>> -            'cor_write', 'cluster_alloc_space', 'none'] }
> >>> +            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
> >>
> >> What's the point of this new blkdebug event?
> >>
> >> Obviously, read_aio for an encrypted image must mean a read of encrypted
> >> data. The same image can never trigger both read_aio and
> >> read_encrypted, so why do we need to distinguish them as two different
> >> events?
> >>
> > 
> > Seems I just done it looking at qcow2_co_preadv_compressed..
> > 
> > Anyway, I think you are right, so, I don't mind if Max drops this new event
> > and use read_aio in his branch, or I can resend the series or send a follow-up,
> > whichever you prefer.
> 
> Should I squash this in?

Looks good to me.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part
  2019-09-13 11:34         ` Kevin Wolf
@ 2019-09-13 11:55           ` Max Reitz
  0 siblings, 0 replies; 17+ messages in thread
From: Max Reitz @ 2019-09-13 11:55 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Vladimir Sementsov-Ogievskiy, Denis Lunev, qemu-block,
	qemu-devel, armbru, stefanha

[-- Attachment #1.1: Type: text/plain, Size: 2217 bytes --]

On 13.09.19 13:34, Kevin Wolf wrote:
> Am 13.09.2019 um 13:06 hat Max Reitz geschrieben:
>> On 13.09.19 12:53, Vladimir Sementsov-Ogievskiy wrote:
>>> 13.09.2019 13:01, Kevin Wolf wrote:
>>>> Am 16.08.2019 um 17:30 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>> Further patch will run partial requests of iterations of
>>>>> qcow2_co_preadv in parallel for performance reasons. To prepare for
>>>>> this, separate part which may be parallelized into separate function
>>>>> (qcow2_co_preadv_task).
>>>>>
>>>>> While being here, also separate encrypted clusters reading to own
>>>>> function, like it is done for compressed reading.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> Reviewed-by: Max Reitz <mreitz@redhat.com>
>>>>> ---
>>>>>   qapi/block-core.json |   2 +-
>>>>>   block/qcow2.c        | 205 +++++++++++++++++++++++--------------------
>>>>>   2 files changed, 111 insertions(+), 96 deletions(-)
>>>>>
>>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>>> index 0d43d4f37c..dd80aa11db 100644
>>>>> --- a/qapi/block-core.json
>>>>> +++ b/qapi/block-core.json
>>>>> @@ -3266,7 +3266,7 @@
>>>>>               'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev',
>>>>>               'pwritev_zero', 'pwritev_done', 'empty_image_prepare',
>>>>>               'l1_shrink_write_table', 'l1_shrink_free_l2_clusters',
>>>>> -            'cor_write', 'cluster_alloc_space', 'none'] }
>>>>> +            'cor_write', 'cluster_alloc_space', 'none', 'read_encrypted'] }
>>>>
>>>> What's the point of this new blkdebug event?
>>>>
>>>> Obviously, read_aio for an encrypted image must mean a read of encrypted
>>>> data. The same image can never trigger both read_aio and
>>>> read_encrypted, so why do we need to distinguish them as two different
>>>> events?
>>>>
>>>
>>> Seems I just done it looking at qcow2_co_preadv_compressed..
>>>
>>> Anyway, I think you are right, so, I don't mind if Max drops this new event
>>> and use read_aio in his branch, or I can resend the series or send a follow-up,
>>> whichever you prefer.
>>
>> Should I squash this in?
> 
> Looks good to me.

OK, done.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io
  2019-09-13  8:58 ` Max Reitz
  2019-09-13  9:11   ` Vladimir Sementsov-Ogievskiy
@ 2019-09-16 13:26   ` Max Reitz
  2019-09-16 15:29     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 17+ messages in thread
From: Max Reitz @ 2019-09-16 13:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, qemu-devel, armbru, stefanha, den

[-- Attachment #1.1: Type: text/plain, Size: 1803 bytes --]

On 13.09.19 10:58, Max Reitz wrote:
> On 16.08.19 17:30, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> Here is an asynchronous scheme for handling fragmented qcow2
>> reads and writes. Both qcow2 read and write functions loops through
>> sequential portions of data. The series aim it to parallelize these
>> loops iterations.
>> It improves performance for fragmented qcow2 images, I've tested it
>> as described below.
> 
> Thanks, I’ve changed two things:
> - Replaced assert((x & (BDRV_SECTOR_SIZE - 1)) == 0) by
>   assert(QEMU_IS_ALIGNED(x, BDRV_SECTOR_SIZE)) in patch 3 (conflict with
>   “block: Use QEMU_IS_ALIGNED”), and
> - Replaced the remaining instance of “qcow2_co_do_pwritev()” by
>   “qcow2_co_pwritev_task()” in a comment in patch 4
> 
> and applied the series to my block branch:
> 
> https://git.xanclic.moe/XanClic/qemu/commits/branch/block

Unfortunately, I’ll have to unstage the series for now because the fix
to 026’s reference output isn’t stable.

When running the test in parallel (I can reproduce it with four
instances on my machine with two cores + HT), I get failures like:

026      fail       [15:21:09] [15:21:37]      (last: 18s)   output
mismatch (see 026.out.bad)
--- tests/qemu-iotests/026.out 2019-09-16 14:49:20.720410701 +0200
+++ tests/qemu-iotests/026.out.bad       2019-09-16 15:21:37.180711936 +0200
@@ -563,7 +563,7 @@
 qemu-io: Failed to flush the refcount block cache: No space left on device
 write failed: No space left on device

-74 leaked clusters were found on the image.
+522 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824

Failures: 026
Failed 1 of 1 iotests

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io
  2019-09-16 13:26   ` Max Reitz
@ 2019-09-16 15:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 17+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-09-16 15:29 UTC (permalink / raw)
  To: Max Reitz, qemu-block; +Cc: kwolf, Denis Lunev, qemu-devel, armbru, stefanha

16.09.2019 16:26, Max Reitz wrote:
> On 13.09.19 10:58, Max Reitz wrote:
>> On 16.08.19 17:30, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> Here is an asynchronous scheme for handling fragmented qcow2
>>> reads and writes. Both qcow2 read and write functions loops through
>>> sequential portions of data. The series aim it to parallelize these
>>> loops iterations.
>>> It improves performance for fragmented qcow2 images, I've tested it
>>> as described below.
>>
>> Thanks, I’ve changed two things:
>> - Replaced assert((x & (BDRV_SECTOR_SIZE - 1)) == 0) by
>>    assert(QEMU_IS_ALIGNED(x, BDRV_SECTOR_SIZE)) in patch 3 (conflict with
>>    “block: Use QEMU_IS_ALIGNED”), and
>> - Replaced the remaining instance of “qcow2_co_do_pwritev()” by
>>    “qcow2_co_pwritev_task()” in a comment in patch 4
>>
>> and applied the series to my block branch:
>>
>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block
> 
> Unfortunately, I’ll have to unstage the series for now because the fix
> to 026’s reference output isn’t stable.
> 
> When running the test in parallel (I can reproduce it with four
> instances on my machine with two cores + HT), I get failures like:
> 
> 026      fail       [15:21:09] [15:21:37]      (last: 18s)   output
> mismatch (see 026.out.bad)
> --- tests/qemu-iotests/026.out 2019-09-16 14:49:20.720410701 +0200
> +++ tests/qemu-iotests/026.out.bad       2019-09-16 15:21:37.180711936 +0200
> @@ -563,7 +563,7 @@
>   qemu-io: Failed to flush the refcount block cache: No space left on device
>   write failed: No space left on device
> 
> -74 leaked clusters were found on the image.
> +522 leaked clusters were found on the image.
>   This means waste of disk space, but no harm to data.
>   Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
> 
> Failures: 026
> Failed 1 of 1 iotests
> 

Unfortunate enough:)

Hmm, can't reproduce, but I tend to fix this by just filtering out information about
leaked clusters in this test, as no sense in tracking it for failure paths, keeping
in mind newly introduced async handling of request parts.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16 15:30 [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 1/5] tests/qemu-iotests: Fix qemu-io related output in 026.out.nocache Vladimir Sementsov-Ogievskiy
2019-09-09 21:58   ` [Qemu-devel] [Qemu-block] " John Snow
2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 2/5] block: introduce aio task pool Vladimir Sementsov-Ogievskiy
2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 3/5] block/qcow2: refactor qcow2_co_preadv_part Vladimir Sementsov-Ogievskiy
2019-09-13 10:01   ` Kevin Wolf
2019-09-13 10:53     ` Vladimir Sementsov-Ogievskiy
2019-09-13 11:06       ` Max Reitz
2019-09-13 11:34         ` Kevin Wolf
2019-09-13 11:55           ` Max Reitz
2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 4/5] block/qcow2: refactor qcow2_co_pwritev_part Vladimir Sementsov-Ogievskiy
2019-08-16 15:30 ` [Qemu-devel] [PATCH v4 5/5] block/qcow2: introduce parallel subrequest handling in read and write Vladimir Sementsov-Ogievskiy
2019-09-03 12:16 ` [Qemu-devel] [PATCH v4 0/5] qcow2: async handling of fragmented io Vladimir Sementsov-Ogievskiy
2019-09-13  8:58 ` Max Reitz
2019-09-13  9:11   ` Vladimir Sementsov-Ogievskiy
2019-09-16 13:26   ` Max Reitz
2019-09-16 15:29     ` Vladimir Sementsov-Ogievskiy

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git