All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/15] preallocate filter
@ 2020-09-18 18:19 Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING Vladimir Sementsov-Ogievskiy
                   ` (14 more replies)
  0 siblings, 15 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Hi all!

Here is a filter, which does preallocation on write.

In Virtuozzo we have to deal with some custom distributed storage
solution, where allocation is very-very expensive operation. We have to
workaround it in Qemu, so here is a new filter.

Still, the filter shows good results for me even for xfs and ext4.

Here are results, produced by new benchmark (last 4 patches):

All results are in iops (larger means better)

----------------------------------  -----------  -----------
                                    A            B
                                    no-prealloc  prealloc
ssd-ext4, aligned sequential 16k    19934±1.2%   27108±0.27%
                                                  A+36±2%
ssd-xfs, aligned sequential 16k     15528±5.5%   25953±3.3%
                                                  A+67±11%
hdd-ext4, aligned sequential 16k    5079±29%     3165±11%
                                                  A-38±36%
hdd-xfs, aligned sequential 16k     4096±95%     3321±7.6%
                                                  A-19±101%
ssd-ext4, unaligned sequential 64k  19969±1.9%   27043±0.49%
                                                  A+35±3%
ssd-xfs, unaligned sequential 64k   15403±2.8%   25725±6.4%
                                                  A+67±13%
hdd-ext4, unaligned sequential 64k  5250±17%     3239±8.7%
                                                  A-38±23%
hdd-xfs, unaligned sequential 64k   5291±8.2%    3336±4.2%
                                                  A-37±11%
----------------------------------  -----------  -----------

Note: it's on Fedora 30, kernel 5.6.13-100.fc30.x86_64

The tests are actually qemu-img bench, run like:

  ./qemu-img create -f qcow2 $img 16G

aligned:
  ./qemu-img bench -c 10000 -d 64 -f qcow2  -s 16k -t none -n -w $img

unaligned
  ./qemu-img bench -c 10000 -d 64 -f qcow2 -o 1k -s 64k -t none -n -w $img

and for preallocation, you'll drop -f qcow2, add --image-opts, and
instead of just $img use
  driver=qcow2,file.driver=preallocate,file.file.driver=file,file.file.filename=$img 

v6:
05: add Max's r-b
06: add Max's r-b
07: new
08: Changed a lot. really. no .active now, support more use-cases.
    Somehow, now I see performance benefit on xfs too :)
    probably due to .zero_start feature.
09: new
10: new
11: mostly rewritten, a lot more cases, drop r-b
12-15: new, to produce final benchmark table

Vladimir Sementsov-Ogievskiy (15):
  block: simplify comment to BDRV_REQ_SERIALISING
  block/io.c: drop assertion on double waiting for request serialisation
  block/io: split out bdrv_find_conflicting_request
  block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
  block: bdrv_mark_request_serialising: split non-waiting function
  block: introduce BDRV_REQ_NO_WAIT flag
  block: bdrv_check_perm(): process children anyway
  block: introduce preallocate filter
  qemu-io: add preallocate mode parameter for truncate command
  iotests: qemu_io_silent: support --image-opts
  iotests: add 298 to test new preallocate filter driver
  scripts/simplebench: support iops
  scripts/simplebench: improve view of ascii table
  scripts/simplebench: improve ascii table: add difference line
  scripts/simplebench: add bench_prealloc.py

 docs/system/qemu-block-drivers.rst.inc |  26 ++
 qapi/block-core.json                   |  20 +-
 include/block/block.h                  |  20 +-
 include/block/block_int.h              |   3 +-
 block.c                                |  10 +-
 block/file-posix.c                     |   2 +-
 block/io.c                             | 130 +++---
 block/preallocate.c                    | 556 +++++++++++++++++++++++++
 qemu-io-cmds.c                         |  46 +-
 block/meson.build                      |   1 +
 scripts/simplebench/bench_prealloc.py  | 128 ++++++
 scripts/simplebench/simplebench.py     | 103 ++++-
 tests/qemu-iotests/298                 | 186 +++++++++
 tests/qemu-iotests/298.out             |   5 +
 tests/qemu-iotests/group               |   1 +
 tests/qemu-iotests/iotests.py          |   7 +-
 16 files changed, 1146 insertions(+), 98 deletions(-)
 create mode 100644 block/preallocate.c
 create mode 100755 scripts/simplebench/bench_prealloc.py
 create mode 100644 tests/qemu-iotests/298
 create mode 100644 tests/qemu-iotests/298.out

-- 
2.21.3



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-28 15:34   ` Alberto Garcia
  2020-09-18 18:19 ` [PATCH v6 02/15] block/io.c: drop assertion on double waiting for request serialisation Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

1. BDRV_REQ_NO_SERIALISING doesn't exist already, don't mention it.

2. We are going to add one more user of BDRV_REQ_SERIALISING, so
   comment about backup becomes a bit confusing here. The use case in
   backup is documented in block/backup.c, so let's just drop
   duplication here.

3. The fact that BDRV_REQ_SERIALISING is only for write requests is
   omitted. Add a note.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/block.h | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b314..ef948e3f34 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -53,16 +53,7 @@ typedef enum {
      * content. */
     BDRV_REQ_WRITE_UNCHANGED    = 0x40,
 
-    /*
-     * BDRV_REQ_SERIALISING forces request serialisation for writes.
-     * It is used to ensure that writes to the backing file of a backup process
-     * target cannot race with a read of the backup target that defers to the
-     * backing file.
-     *
-     * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to
-     * BDRV_REQ_NO_SERIALISING. A more descriptive name for the latter might be
-     * _DO_NOT_WAIT_FOR_SERIALISING, except that is too long.
-     */
+    /* Forces request serialisation. Use only with write requests. */
     BDRV_REQ_SERIALISING        = 0x80,
 
     /* Execute the request only if the operation can be offloaded or otherwise
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 02/15] block/io.c: drop assertion on double waiting for request serialisation
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 03/15] block/io: split out bdrv_find_conflicting_request Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den, Paolo Bonzini

The comments states, that on misaligned request we should have already
been waiting. But for bdrv_padding_rmw_read, we called
bdrv_mark_request_serialising with align = request_alignment, and now
we serialise with align = cluster_size. So we may have to wait again
with larger alignment.

Note, that the only user of BDRV_REQ_SERIALISING is backup which issues
cluster-aligned requests, so seems the assertion should not fire for
now. But it's wrong anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/io.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index a2389bb38c..67617bb9b2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1881,7 +1881,6 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes,
                           BdrvTrackedRequest *req, int flags)
 {
     BlockDriverState *bs = child->bs;
-    bool waited;
     int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE);
 
     if (bs->read_only) {
@@ -1893,15 +1892,7 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes,
     assert(!(flags & ~BDRV_REQ_MASK));
 
     if (flags & BDRV_REQ_SERIALISING) {
-        waited = bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
-        /*
-         * For a misaligned request we should have already waited earlier,
-         * because we come after bdrv_padding_rmw_read which must be called
-         * with the request already marked as serialising.
-         */
-        assert(!waited ||
-               (req->offset == req->overlap_offset &&
-                req->bytes == req->overlap_bytes));
+        bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
     } else {
         bdrv_wait_serialising_requests(req);
     }
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 03/15] block/io: split out bdrv_find_conflicting_request
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 02/15] block/io.c: drop assertion on double waiting for request serialisation Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 04/15] block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

To be reused in separate.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io.c | 71 +++++++++++++++++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/block/io.c b/block/io.c
index 67617bb9b2..c58fd36091 100644
--- a/block/io.c
+++ b/block/io.c
@@ -727,43 +727,54 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
     return true;
 }
 
+/* Called with self->bs->reqs_lock held */
+static BdrvTrackedRequest *
+bdrv_find_conflicting_request(BdrvTrackedRequest *self)
+{
+    BdrvTrackedRequest *req;
+
+    QLIST_FOREACH(req, &self->bs->tracked_requests, list) {
+        if (req == self || (!req->serialising && !self->serialising)) {
+            continue;
+        }
+        if (tracked_request_overlaps(req, self->overlap_offset,
+                                     self->overlap_bytes))
+        {
+            /*
+             * Hitting this means there was a reentrant request, for
+             * example, a block driver issuing nested requests.  This must
+             * never happen since it means deadlock.
+             */
+            assert(qemu_coroutine_self() != req->co);
+
+            /*
+             * If the request is already (indirectly) waiting for us, or
+             * will wait for us as soon as it wakes up, then just go on
+             * (instead of producing a deadlock in the former case).
+             */
+            if (!req->waiting_for) {
+                return req;
+            }
+        }
+    }
+
+    return NULL;
+}
+
 static bool coroutine_fn
 bdrv_wait_serialising_requests_locked(BlockDriverState *bs,
                                       BdrvTrackedRequest *self)
 {
     BdrvTrackedRequest *req;
-    bool retry;
     bool waited = false;
 
-    do {
-        retry = false;
-        QLIST_FOREACH(req, &bs->tracked_requests, list) {
-            if (req == self || (!req->serialising && !self->serialising)) {
-                continue;
-            }
-            if (tracked_request_overlaps(req, self->overlap_offset,
-                                         self->overlap_bytes))
-            {
-                /* Hitting this means there was a reentrant request, for
-                 * example, a block driver issuing nested requests.  This must
-                 * never happen since it means deadlock.
-                 */
-                assert(qemu_coroutine_self() != req->co);
-
-                /* If the request is already (indirectly) waiting for us, or
-                 * will wait for us as soon as it wakes up, then just go on
-                 * (instead of producing a deadlock in the former case). */
-                if (!req->waiting_for) {
-                    self->waiting_for = req;
-                    qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
-                    self->waiting_for = NULL;
-                    retry = true;
-                    waited = true;
-                    break;
-                }
-            }
-        }
-    } while (retry);
+    while ((req = bdrv_find_conflicting_request(self))) {
+        self->waiting_for = req;
+        qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
+        self->waiting_for = NULL;
+        waited = true;
+    }
+
     return waited;
 }
 
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 04/15] block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 03/15] block/io: split out bdrv_find_conflicting_request Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 05/15] block: bdrv_mark_request_serialising: split non-waiting function Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

bs is linked in req, so no needs to pass it separately. Most of
tracked-requests API doesn't have bs argument. Actually, after this
patch only tracked_request_begin has it, but it's for purpose.

While being here, also add a comment about what "_locked" is.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index c58fd36091..ab9ef7fd1a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -761,16 +761,16 @@ bdrv_find_conflicting_request(BdrvTrackedRequest *self)
     return NULL;
 }
 
+/* Called with self->bs->reqs_lock held */
 static bool coroutine_fn
-bdrv_wait_serialising_requests_locked(BlockDriverState *bs,
-                                      BdrvTrackedRequest *self)
+bdrv_wait_serialising_requests_locked(BdrvTrackedRequest *self)
 {
     BdrvTrackedRequest *req;
     bool waited = false;
 
     while ((req = bdrv_find_conflicting_request(self))) {
         self->waiting_for = req;
-        qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
+        qemu_co_queue_wait(&req->wait_queue, &self->bs->reqs_lock);
         self->waiting_for = NULL;
         waited = true;
     }
@@ -794,7 +794,7 @@ bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
 
     req->overlap_offset = MIN(req->overlap_offset, overlap_offset);
     req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
-    waited = bdrv_wait_serialising_requests_locked(bs, req);
+    waited = bdrv_wait_serialising_requests_locked(req);
     qemu_co_mutex_unlock(&bs->reqs_lock);
     return waited;
 }
@@ -876,7 +876,7 @@ static bool coroutine_fn bdrv_wait_serialising_requests(BdrvTrackedRequest *self
     }
 
     qemu_co_mutex_lock(&bs->reqs_lock);
-    waited = bdrv_wait_serialising_requests_locked(bs, self);
+    waited = bdrv_wait_serialising_requests_locked(self);
     qemu_co_mutex_unlock(&bs->reqs_lock);
 
     return waited;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 05/15] block: bdrv_mark_request_serialising: split non-waiting function
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 04/15] block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 06/15] block: introduce BDRV_REQ_NO_WAIT flag Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

We'll need a separate function, which will only "mark" request
serialising with specified align but not wait for conflicting
requests. So, it will be like old bdrv_mark_request_serialising(),
before merging bdrv_wait_serialising_requests_locked() into it.

To reduce the possible mess, let's do the following:

Public function that does both marking and waiting will be called
bdrv_make_request_serialising, and private function which will only
"mark" will be called tracked_request_set_serialising().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  3 ++-
 block/file-posix.c        |  2 +-
 block/io.c                | 35 +++++++++++++++++++++++------------
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38cad9d15c..887b0668d8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1052,7 +1052,8 @@ extern unsigned int bdrv_drain_all_count;
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
 void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent);
 
-bool coroutine_fn bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align);
+bool coroutine_fn bdrv_make_request_serialising(BdrvTrackedRequest *req,
+                                                uint64_t align);
 BdrvTrackedRequest *coroutine_fn bdrv_co_get_self_request(BlockDriverState *bs);
 
 int get_tmp_filename(char *filename, int size);
diff --git a/block/file-posix.c b/block/file-posix.c
index c63926d592..37d9266f6a 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2953,7 +2953,7 @@ raw_do_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int bytes,
         req->bytes = end - req->offset;
         req->overlap_bytes = req->bytes;
 
-        bdrv_mark_request_serialising(req, bs->bl.request_alignment);
+        bdrv_make_request_serialising(req, bs->bl.request_alignment);
     }
 #endif
 
diff --git a/block/io.c b/block/io.c
index ab9ef7fd1a..9b148bb8ea 100644
--- a/block/io.c
+++ b/block/io.c
@@ -778,15 +778,14 @@ bdrv_wait_serialising_requests_locked(BdrvTrackedRequest *self)
     return waited;
 }
 
-bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
+/* Called with req->bs->reqs_lock held */
+static void tracked_request_set_serialising(BdrvTrackedRequest *req,
+                                            uint64_t align)
 {
-    BlockDriverState *bs = req->bs;
     int64_t overlap_offset = req->offset & ~(align - 1);
     uint64_t overlap_bytes = ROUND_UP(req->offset + req->bytes, align)
                                - overlap_offset;
-    bool waited;
 
-    qemu_co_mutex_lock(&bs->reqs_lock);
     if (!req->serialising) {
         atomic_inc(&req->bs->serialising_in_flight);
         req->serialising = true;
@@ -794,9 +793,6 @@ bool bdrv_mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
 
     req->overlap_offset = MIN(req->overlap_offset, overlap_offset);
     req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
-    waited = bdrv_wait_serialising_requests_locked(req);
-    qemu_co_mutex_unlock(&bs->reqs_lock);
-    return waited;
 }
 
 /**
@@ -882,6 +878,21 @@ static bool coroutine_fn bdrv_wait_serialising_requests(BdrvTrackedRequest *self
     return waited;
 }
 
+bool coroutine_fn bdrv_make_request_serialising(BdrvTrackedRequest *req,
+                                                uint64_t align)
+{
+    bool waited;
+
+    qemu_co_mutex_lock(&req->bs->reqs_lock);
+
+    tracked_request_set_serialising(req, align);
+    waited = bdrv_wait_serialising_requests_locked(req);
+
+    qemu_co_mutex_unlock(&req->bs->reqs_lock);
+
+    return waited;
+}
+
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
                                    size_t size)
 {
@@ -1492,7 +1503,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
          * with each other for the same cluster.  For example, in copy-on-read
          * it ensures that the CoR read and write operations are atomic and
          * guest writes cannot interleave between them. */
-        bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
+        bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
     } else {
         bdrv_wait_serialising_requests(req);
     }
@@ -1903,7 +1914,7 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes,
     assert(!(flags & ~BDRV_REQ_MASK));
 
     if (flags & BDRV_REQ_SERIALISING) {
-        bdrv_mark_request_serialising(req, bdrv_get_cluster_size(bs));
+        bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
     } else {
         bdrv_wait_serialising_requests(req);
     }
@@ -2069,7 +2080,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
 
     padding = bdrv_init_padding(bs, offset, bytes, &pad);
     if (padding) {
-        bdrv_mark_request_serialising(req, align);
+        bdrv_make_request_serialising(req, align);
 
         bdrv_padding_rmw_read(child, req, &pad, true);
 
@@ -2183,7 +2194,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
     }
 
     if (bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad)) {
-        bdrv_mark_request_serialising(&req, align);
+        bdrv_make_request_serialising(&req, align);
         bdrv_padding_rmw_read(child, &req, &pad, false);
     }
 
@@ -3344,7 +3355,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, bool exact,
      * new area, we need to make sure that no write requests are made to it
      * concurrently or they might be overwritten by preallocation. */
     if (new_bytes) {
-        bdrv_mark_request_serialising(&req, 1);
+        bdrv_make_request_serialising(&req, 1);
     }
     if (bs->read_only) {
         error_setg(errp, "Image is read-only");
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 06/15] block: introduce BDRV_REQ_NO_WAIT flag
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 05/15] block: bdrv_mark_request_serialising: split non-waiting function Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Add flag to make serialising request no wait: if there are conflicting
requests, just return error immediately. It's will be used in upcoming
preallocate filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block.h |  9 ++++++++-
 block/io.c            | 11 ++++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index ef948e3f34..e7188fea05 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -67,8 +67,15 @@ typedef enum {
      * written to qiov parameter which may be NULL.
      */
     BDRV_REQ_PREFETCH  = 0x200,
+
+    /*
+     * If we need to wait for other requests, just fail immediately. Used
+     * only together with BDRV_REQ_SERIALISING.
+     */
+    BDRV_REQ_NO_WAIT = 0x400,
+
     /* Mask of valid flags */
-    BDRV_REQ_MASK               = 0x3ff,
+    BDRV_REQ_MASK               = 0x7ff,
 } BdrvRequestFlags;
 
 typedef struct BlockSizes {
diff --git a/block/io.c b/block/io.c
index 9b148bb8ea..fdcac4888e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1912,9 +1912,18 @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes,
     assert(!(bs->open_flags & BDRV_O_INACTIVE));
     assert((bs->open_flags & BDRV_O_NO_IO) == 0);
     assert(!(flags & ~BDRV_REQ_MASK));
+    assert(!((flags & BDRV_REQ_NO_WAIT) && !(flags & BDRV_REQ_SERIALISING)));
 
     if (flags & BDRV_REQ_SERIALISING) {
-        bdrv_make_request_serialising(req, bdrv_get_cluster_size(bs));
+        QEMU_LOCK_GUARD(&bs->reqs_lock);
+
+        tracked_request_set_serialising(req, bdrv_get_cluster_size(bs));
+
+        if ((flags & BDRV_REQ_NO_WAIT) && bdrv_find_conflicting_request(req)) {
+            return -EBUSY;
+        }
+
+        bdrv_wait_serialising_requests_locked(req);
     } else {
         bdrv_wait_serialising_requests(req);
     }
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 06/15] block: introduce BDRV_REQ_NO_WAIT flag Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-24 14:25   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 08/15] block: introduce preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Do generic processing even for drivers which define .bdrv_check_perm
handler. It's needed for further preallocate filter: it will need to do
additional action on bdrv_check_perm, but don't want to reimplement
generic logic.

The patch doesn't change existing behaviour: the only driver that
implements bdrv_check_perm is file-posix, but it never has any
children.

Also, bdrv_set_perm() don't stop processing if driver has
.bdrv_set_perm handler as well.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 9538af4884..165c2d3cb2 100644
--- a/block.c
+++ b/block.c
@@ -1964,8 +1964,7 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
 /*
  * Check whether permissions on this node can be changed in a way that
  * @cumulative_perms and @cumulative_shared_perms are the new cumulative
- * permissions of all its parents. This involves checking whether all necessary
- * permission changes to child nodes can be performed.
+ * permissions of all its parents.
  *
  * Will set *tighten_restrictions to true if and only if new permissions have to
  * be taken or currently shared permissions are to be unshared.  Otherwise,
@@ -2047,8 +2046,11 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
     }
 
     if (drv->bdrv_check_perm) {
-        return drv->bdrv_check_perm(bs, cumulative_perms,
-                                    cumulative_shared_perms, errp);
+        ret = drv->bdrv_check_perm(bs, cumulative_perms,
+                                   cumulative_shared_perms, errp);
+        if (ret < 0) {
+            return ret;
+        }
     }
 
     /* Drivers that never have children can omit .bdrv_child_perm() */
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 08/15] block: introduce preallocate filter
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-24 16:50   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 docs/system/qemu-block-drivers.rst.inc |  26 ++
 qapi/block-core.json                   |  20 +-
 block/preallocate.c                    | 556 +++++++++++++++++++++++++
 block/meson.build                      |   1 +
 4 files changed, 602 insertions(+), 1 deletion(-)
 create mode 100644 block/preallocate.c

diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..60a064b232 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -952,3 +952,29 @@ on host and see if there are locks held by the QEMU process on the image file.
 More than one byte could be locked by the QEMU instance, each byte of which
 reflects a particular permission that is acquired or protected by the running
 block driver.
+
+Filter drivers
+~~~~~~~~~~~~~~
+
+QEMU supports several filter drivers, which don't store any data, but perform
+some additional tasks, hooking io requests.
+
+.. program:: filter-drivers
+.. option:: preallocate
+
+  The preallocate filter driver is intended to be inserted between format
+  and protocol nodes and preallocates some additional space
+  (expanding the protocol file) when writing past the file’s end. This can be
+  useful for file-systems with slow allocation.
+
+  Supported options:
+
+  .. program:: preallocate
+  .. option:: prealloc-align
+
+    On preallocation, align the file length to this value (in bytes), default 1M.
+
+  .. program:: preallocate
+  .. option:: prealloc-size
+
+    How much to preallocate (in bytes), default 128M.
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2d94873ca0..c8030e19b4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2819,7 +2819,7 @@
             'cloop', 'compress', 'copy-on-read', 'dmg', 'file', 'ftp', 'ftps',
             'gluster', 'host_cdrom', 'host_device', 'http', 'https', 'iscsi',
             'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
-            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
+            'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
             { 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
             'sheepdog',
             'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
@@ -3088,6 +3088,23 @@
   'data': { 'aes': 'QCryptoBlockOptionsQCow',
             'luks': 'QCryptoBlockOptionsLUKS'} }
 
+##
+# @BlockdevOptionsPreallocate:
+#
+# Filter driver intended to be inserted between format and protocol node
+# and do preallocation in protocol node on write.
+#
+# @prealloc-align: on preallocation, align file length to this number,
+#                  default 1048576 (1M)
+#
+# @prealloc-size: how much to preallocate, default 134217728 (128M)
+#
+# Since: 5.2
+##
+{ 'struct': 'BlockdevOptionsPreallocate',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { '*prealloc-align': 'int', '*prealloc-size': 'int' } }
+
 ##
 # @BlockdevOptionsQcow2:
 #
@@ -3993,6 +4010,7 @@
       'null-co':    'BlockdevOptionsNull',
       'nvme':       'BlockdevOptionsNVMe',
       'parallels':  'BlockdevOptionsGenericFormat',
+      'preallocate':'BlockdevOptionsPreallocate',
       'qcow2':      'BlockdevOptionsQcow2',
       'qcow':       'BlockdevOptionsQcow',
       'qed':        'BlockdevOptionsGenericCOWFormat',
diff --git a/block/preallocate.c b/block/preallocate.c
new file mode 100644
index 0000000000..6510ad0149
--- /dev/null
+++ b/block/preallocate.c
@@ -0,0 +1,556 @@
+/*
+ * preallocate filter driver
+ *
+ * The driver performs preallocate operation: it is injected above
+ * some node, and before each write over EOF it does additional preallocating
+ * write-zeroes request.
+ *
+ * Copyright (c) 2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu/units.h"
+#include "block/block_int.h"
+
+
+typedef struct PreallocateOpts {
+    int64_t prealloc_size;
+    int64_t prealloc_align;
+} PreallocateOpts;
+
+typedef struct BDRVPreallocateState {
+    PreallocateOpts opts;
+
+    /*
+     * Track real data end, to crop preallocation on close. If < 0 the status is
+     * unknown.
+     *
+     * @data_end is a maximum of file size on open (or when we get write/resize
+     * permissions) and all write request ends after it. So it's safe to
+     * truncate to data_end if it is valid.
+     */
+    int64_t data_end;
+
+    /*
+     * Start of trailing preallocated area which reads as zero. May be smaller
+     * than data_end, if user does over-EOF write zero operation. If < 0 the
+     * status is unknown.
+     *
+     * If both @zero_start and @file_end are valid, the region
+     * [@zero_start, @file_end) is known to be preallocated zeroes. If @file_end
+     * is not valid, @zero_start doesn't make much sense.
+     */
+    int64_t zero_start;
+
+    /*
+     * Real end of file. Actually the cache for bdrv_getlength(bs->file->bs),
+     * to avoid extra lseek() calls on each write operation. If < 0 the status
+     * is unknown.
+     */
+    int64_t file_end;
+
+    /*
+     * All three states @data_end, @zero_start and @file_end are guaranteed to
+     * be invalid (< 0) when we don't have both exclusive BLK_PERM_RESIZE and
+     * BLK_PERM_WRITE permissions on file child.
+     */
+} BDRVPreallocateState;
+
+#define PREALLOCATE_OPT_PREALLOC_ALIGN "prealloc-align"
+#define PREALLOCATE_OPT_PREALLOC_SIZE "prealloc-size"
+static QemuOptsList runtime_opts = {
+    .name = "preallocate",
+    .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+    .desc = {
+        {
+            .name = PREALLOCATE_OPT_PREALLOC_ALIGN,
+            .type = QEMU_OPT_SIZE,
+            .help = "on preallocation, align file length to this number, "
+                "default 1M",
+        },
+        {
+            .name = PREALLOCATE_OPT_PREALLOC_SIZE,
+            .type = QEMU_OPT_SIZE,
+            .help = "how much to preallocate, default 128M",
+        },
+        { /* end of list */ }
+    },
+};
+
+static bool preallocate_absorb_opts(PreallocateOpts *dest, QDict *options,
+                                    BlockDriverState *child_bs, Error **errp)
+{
+    QemuOpts *opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
+
+    if (!qemu_opts_absorb_qdict(opts, options, errp)) {
+        return false;
+    }
+
+    dest->prealloc_align =
+        qemu_opt_get_size(opts, PREALLOCATE_OPT_PREALLOC_ALIGN, 1 * MiB);
+    dest->prealloc_size =
+        qemu_opt_get_size(opts, PREALLOCATE_OPT_PREALLOC_SIZE, 128 * MiB);
+
+    qemu_opts_del(opts);
+
+    if (!QEMU_IS_ALIGNED(dest->prealloc_align, BDRV_SECTOR_SIZE)) {
+        error_setg(errp, "prealloc-align parameter of preallocate filter "
+                   "is not aligned to %llu", BDRV_SECTOR_SIZE);
+        return false;
+    }
+
+    if (!QEMU_IS_ALIGNED(dest->prealloc_align,
+                         child_bs->bl.request_alignment)) {
+        error_setg(errp, "prealloc-align parameter of preallocate filter "
+                   "is not aligned to underlying node request alignment "
+                   "(%" PRIi32 ")", child_bs->bl.request_alignment);
+        return false;
+    }
+
+    return true;
+}
+
+static int preallocate_open(BlockDriverState *bs, QDict *options, int flags,
+                            Error **errp)
+{
+    BDRVPreallocateState *s = bs->opaque;
+
+    /*
+     * s->data_end and friends should be initialized on permission update.
+     * For this to work, mark them invalid.
+     */
+    s->file_end = s->zero_start = s->data_end = -EINVAL;
+
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+                               BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+                               false, errp);
+    if (!bs->file) {
+        return -EINVAL;
+    }
+
+    if (!preallocate_absorb_opts(&s->opts, options, bs->file->bs, errp)) {
+        return -EINVAL;
+    }
+
+    bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+        (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
+
+    bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+        ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
+            bs->file->bs->supported_zero_flags);
+
+    return 0;
+}
+
+static void preallocate_close(BlockDriverState *bs)
+{
+    int ret;
+    BDRVPreallocateState *s = bs->opaque;
+
+    if (s->data_end < 0) {
+        return;
+    }
+
+    if (s->file_end < 0) {
+        s->file_end = bdrv_getlength(bs->file->bs);
+        if (s->file_end < 0) {
+            return;
+        }
+    }
+
+    if (s->data_end < s->file_end) {
+        ret = bdrv_truncate(bs->file, s->data_end, true, PREALLOC_MODE_OFF, 0,
+                            NULL);
+        s->file_end = ret < 0 ? ret : s->data_end;
+    }
+}
+
+
+/*
+ * Handle reopen.
+ *
+ * We must implement reopen handlers, otherwise reopen just don't work. Handle
+ * new options and don't care about preallocation state, as it is handled in
+ * set/check permission handlers.
+ */
+
+static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
+                                      BlockReopenQueue *queue, Error **errp)
+{
+    PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
+
+    if (!preallocate_absorb_opts(opts, reopen_state->options,
+                                 reopen_state->bs->file->bs, errp)) {
+        g_free(opts);
+        return -EINVAL;
+    }
+
+    reopen_state->opaque = opts;
+
+    return 0;
+}
+
+static void preallocate_reopen_commit(BDRVReopenState *state)
+{
+    BDRVPreallocateState *s = state->bs->opaque;
+
+    s->opts = *(PreallocateOpts *)state->opaque;
+
+    g_free(state->opaque);
+    state->opaque = NULL;
+}
+
+static void preallocate_reopen_abort(BDRVReopenState *state)
+{
+    g_free(state->opaque);
+    state->opaque = NULL;
+}
+
+static coroutine_fn int preallocate_co_preadv_part(
+        BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+        QEMUIOVector *qiov, size_t qiov_offset, int flags)
+{
+    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+                               flags);
+}
+
+static int coroutine_fn preallocate_co_pdiscard(BlockDriverState *bs,
+                                               int64_t offset, int bytes)
+{
+    return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static bool can_write_resize(uint64_t perm)
+{
+    return (perm & BLK_PERM_WRITE) && (perm & BLK_PERM_RESIZE);
+}
+
+static bool has_prealloc_perms(BlockDriverState *bs)
+{
+    BDRVPreallocateState *s = bs->opaque;
+
+    if (can_write_resize(bs->file->perm)) {
+        assert(!(bs->file->shared_perm & BLK_PERM_WRITE));
+        assert(!(bs->file->shared_perm & BLK_PERM_RESIZE));
+        return true;
+    }
+
+    assert(s->data_end < 0);
+    assert(s->zero_start < 0);
+    assert(s->file_end < 0);
+    return false;
+}
+
+/*
+ * Call on each write. Returns true if @want_merge_zero is true and the region
+ * [offset, offset + bytes) is zeroed (as a result of this call or earlier
+ * preallocation).
+ *
+ * want_merge_zero is used to merge write-zero request with preallocation in
+ * one bdrv_co_pwrite_zeroes() call.
+ */
+static bool coroutine_fn handle_write(BlockDriverState *bs, int64_t offset,
+                                      int64_t bytes, bool want_merge_zero)
+{
+    BDRVPreallocateState *s = bs->opaque;
+    int64_t end = offset + bytes;
+    int64_t prealloc_start, prealloc_end;
+    int ret;
+
+    if (!has_prealloc_perms(bs)) {
+        /* We don't have state neither should try to recover it */
+        return false;
+    }
+
+    if (s->data_end < 0) {
+        s->data_end = bdrv_getlength(bs->file->bs);
+        if (s->data_end < 0) {
+            return false;
+        }
+
+        if (s->file_end < 0) {
+            s->file_end = s->data_end;
+        }
+    }
+
+    if (end <= s->data_end) {
+        return false;
+    }
+
+    /* We have valid s->data_end, and request writes beyond it. */
+
+    s->data_end = end;
+    if (s->zero_start < 0 || !want_merge_zero) {
+        s->zero_start = end;
+    }
+
+    if (s->file_end < 0) {
+        s->file_end = bdrv_getlength(bs->file->bs);
+        if (s->file_end < 0) {
+            return false;
+        }
+    }
+
+    /* Now s->data_end, s->zero_start and s->file_end are valid. */
+
+    if (end <= s->file_end) {
+        /* No preallocation needed. */
+        return want_merge_zero && offset >= s->zero_start;
+    }
+
+    /* Now we want new preallocation, as request writes beyond s->data_end. */
+
+    prealloc_start = want_merge_zero ? MIN(offset, s->file_end) : s->file_end;
+    prealloc_end = QEMU_ALIGN_UP(offset + bytes + s->opts.prealloc_size,
+                                 s->opts.prealloc_align);
+    s->file_end = end;
+
+    ret = bdrv_co_pwrite_zeroes(
+            bs->file, prealloc_start, prealloc_end - prealloc_start,
+            BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
+    if (ret < 0) {
+        s->file_end = ret;
+        return false;
+    }
+
+    s->file_end = prealloc_end;
+    return want_merge_zero;
+}
+
+static int coroutine_fn preallocate_co_pwrite_zeroes(BlockDriverState *bs,
+        int64_t offset, int bytes, BdrvRequestFlags flags)
+{
+    bool want_merge_zero =
+        (flags & (BDRV_REQ_ZERO_WRITE | BDRV_REQ_NO_FALLBACK)) == flags;
+    if (handle_write(bs, offset, bytes, want_merge_zero)) {
+        return 0;
+    }
+
+    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static coroutine_fn int preallocate_co_pwritev_part(BlockDriverState *bs,
+                                                    uint64_t offset,
+                                                    uint64_t bytes,
+                                                    QEMUIOVector *qiov,
+                                                    size_t qiov_offset,
+                                                    int flags)
+{
+    handle_write(bs, offset, bytes, false);
+
+    return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
+                                flags);
+}
+
+static int coroutine_fn
+preallocate_co_truncate(BlockDriverState *bs, int64_t offset,
+                        bool exact, PreallocMode prealloc,
+                        BdrvRequestFlags flags, Error **errp)
+{
+    ERRP_GUARD();
+    BDRVPreallocateState *s = bs->opaque;
+    int ret;
+
+    if (s->data_end >= 0 && offset > s->data_end) {
+        if (s->file_end < 0) {
+            s->file_end = bdrv_getlength(bs->file->bs);
+            if (s->file_end < 0) {
+                error_setg(errp, "failed to get file length");
+                return s->file_end;
+            }
+        }
+
+        if (prealloc == PREALLOC_MODE_FALLOC) {
+            /*
+             * If offset <= s->file_end, the task is already done, just
+             * update s->file_end, to move part of "filter preallocation"
+             * to "preallocation requested by user".
+             * Otherwise just proceed to preallocate missing part.
+             */
+            if (offset <= s->file_end) {
+                s->data_end = offset;
+                return 0;
+            }
+        } else {
+            /*
+             * We have to drop our preallocation, to
+             * - avoid "Cannot use preallocation for shrinking files" in
+             *   case of offset < file_end
+             * - give PREALLOC_MODE_OFF a chance to keep small disk
+             *   usage
+             * - give PREALLOC_MODE_FULL a chance to actually write the
+             *   whole region as user expects
+             */
+            if (s->file_end > s->data_end) {
+                ret = bdrv_co_truncate(bs->file, s->data_end, true,
+                                       PREALLOC_MODE_OFF, 0, errp);
+                if (ret < 0) {
+                    s->file_end = ret;
+                    error_prepend(errp, "preallocate-filter: failed to drop "
+                                  "write-zero preallocation: ");
+                    return ret;
+                }
+                s->file_end = s->data_end;
+            }
+        }
+
+        s->data_end = offset;
+    }
+
+    ret = bdrv_co_truncate(bs->file, offset, exact, prealloc, flags, errp);
+    if (ret < 0) {
+        s->file_end = s->zero_start = s->data_end = ret;
+        return ret;
+    }
+
+    if (has_prealloc_perms(bs)) {
+        s->file_end = s->zero_start = s->data_end = offset;
+    }
+    return 0;
+}
+
+static int coroutine_fn preallocate_co_flush(BlockDriverState *bs)
+{
+    return bdrv_co_flush(bs->file->bs);
+}
+
+static int64_t preallocate_getlength(BlockDriverState *bs)
+{
+    int64_t ret;
+    BDRVPreallocateState *s = bs->opaque;
+
+    if (s->data_end >= 0) {
+        return s->data_end;
+    }
+
+    ret = bdrv_getlength(bs->file->bs);
+
+    if (has_prealloc_perms(bs)) {
+        s->file_end = s->zero_start = s->data_end = ret;
+    }
+
+    return ret;
+}
+
+static int preallocate_check_perm(BlockDriverState *bs,
+                                  uint64_t perm, uint64_t shared, Error **errp)
+{
+    BDRVPreallocateState *s = bs->opaque;
+
+    if (s->data_end >= 0 && !can_write_resize(perm)) {
+        /*
+         * Loose permissions.
+         * We should truncate in check_perm, as in set_perm bs->file->perm will
+         * be already changed, and we should not violate it.
+         */
+        if (s->file_end < 0) {
+            s->file_end = bdrv_getlength(bs->file->bs);
+            if (s->file_end < 0) {
+                error_setg(errp, "Failed to get file length");
+                return s->file_end;
+            }
+        }
+
+        if (s->data_end < s->file_end) {
+            int ret = bdrv_truncate(bs->file, s->data_end, true,
+                                    PREALLOC_MODE_OFF, 0, NULL);
+            if (ret < 0) {
+                error_setg(errp, "Failed to drop preallocation");
+                s->file_end = ret;
+                return ret;
+            }
+        }
+        /*
+         * We will drop our permissions, as well as allow shared
+         * permissions, anyone will be able to change the child, so mark
+         * all states invalid. We'll regain control if get good permissions
+         * back.
+         */
+        s->data_end = s->file_end = s->zero_start = -EINVAL;
+    }
+
+    return 0;
+}
+
+static void preallocate_set_perm(BlockDriverState *bs,
+                                 uint64_t perm, uint64_t shared)
+{
+    BDRVPreallocateState *s = bs->opaque;
+
+    if (can_write_resize(perm) && s->data_end < 0) {
+        s->data_end = s->file_end = s->zero_start =
+            bdrv_getlength(bs->file->bs);
+    }
+}
+
+static void preallocate_child_perm(BlockDriverState *bs, BdrvChild *c,
+    BdrvChildRole role, BlockReopenQueue *reopen_queue,
+    uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared)
+{
+    bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
+
+    if (can_write_resize(perm)) {
+        /* This should come by default, but let's enforce: */
+        *nperm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+
+        /*
+         * Don't share, to keep our states s->file_end, s->data_end and
+         * s->zero_start valid.
+         */
+        *nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
+    }
+}
+
+BlockDriver bdrv_preallocate_filter = {
+    .format_name = "preallocate",
+    .instance_size = sizeof(BDRVPreallocateState),
+
+    .bdrv_getlength = preallocate_getlength,
+    .bdrv_open = preallocate_open,
+    .bdrv_close = preallocate_close,
+
+    .bdrv_reopen_prepare  = preallocate_reopen_prepare,
+    .bdrv_reopen_commit   = preallocate_reopen_commit,
+    .bdrv_reopen_abort    = preallocate_reopen_abort,
+
+    .bdrv_co_preadv_part = preallocate_co_preadv_part,
+    .bdrv_co_pwritev_part = preallocate_co_pwritev_part,
+    .bdrv_co_pwrite_zeroes = preallocate_co_pwrite_zeroes,
+    .bdrv_co_pdiscard = preallocate_co_pdiscard,
+    .bdrv_co_flush = preallocate_co_flush,
+    .bdrv_co_truncate = preallocate_co_truncate,
+
+    .bdrv_check_perm = preallocate_check_perm,
+    .bdrv_set_perm = preallocate_set_perm,
+    .bdrv_child_perm = preallocate_child_perm,
+
+    .has_variable_length = true,
+    .is_filter = true,
+};
+
+static void bdrv_preallocate_init(void)
+{
+    bdrv_register(&bdrv_preallocate_filter);
+}
+
+block_init(bdrv_preallocate_init);
diff --git a/block/meson.build b/block/meson.build
index a3e56b7cd1..367ec7a525 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -13,6 +13,7 @@ block_ss.add(files(
   'block-copy.c',
   'commit.c',
   'copy-on-read.c',
+  'preallocate.c',
   'create.c',
   'crypto.c',
   'dirty-bitmap.c',
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 08/15] block: introduce preallocate filter Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-24 17:08   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

This will be used in further test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qemu-io-cmds.c | 46 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index baeae86d8c..64f0246a71 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1698,13 +1698,42 @@ static const cmdinfo_t flush_cmd = {
     .oneline    = "flush all in-core file state to disk",
 };
 
+static int truncate_f(BlockBackend *blk, int argc, char **argv);
+static const cmdinfo_t truncate_cmd = {
+    .name       = "truncate",
+    .altname    = "t",
+    .cfunc      = truncate_f,
+    .perm       = BLK_PERM_WRITE | BLK_PERM_RESIZE,
+    .argmin     = 1,
+    .argmax     = 3,
+    .args       = "[-m prealloc_mode] off",
+    .oneline    = "truncates the current file at the given offset",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv)
 {
     Error *local_err = NULL;
     int64_t offset;
-    int ret;
+    int c, ret;
+    PreallocMode prealloc = PREALLOC_MODE_OFF;
 
-    offset = cvtnum(argv[1]);
+    while ((c = getopt(argc, argv, "m:")) != -1) {
+        switch (c) {
+        case 'm':
+            prealloc = qapi_enum_parse(&PreallocMode_lookup, optarg,
+                                       PREALLOC_MODE__MAX, NULL);
+            if (prealloc == PREALLOC_MODE__MAX) {
+                error_report("Invalid preallocation mode '%s'", optarg);
+                return -EINVAL;
+            }
+            break;
+        default:
+            qemuio_command_usage(&truncate_cmd);
+            return -EINVAL;
+        }
+    }
+
+    offset = cvtnum(argv[optind]);
     if (offset < 0) {
         print_cvtnum_err(offset, argv[1]);
         return offset;
@@ -1715,7 +1744,7 @@ static int truncate_f(BlockBackend *blk, int argc, char **argv)
      * exact=true.  It is better to err on the "emit more errors" side
      * than to be overly permissive.
      */
-    ret = blk_truncate(blk, offset, false, PREALLOC_MODE_OFF, 0, &local_err);
+    ret = blk_truncate(blk, offset, false, prealloc, 0, &local_err);
     if (ret < 0) {
         error_report_err(local_err);
         return ret;
@@ -1724,17 +1753,6 @@ static int truncate_f(BlockBackend *blk, int argc, char **argv)
     return 0;
 }
 
-static const cmdinfo_t truncate_cmd = {
-    .name       = "truncate",
-    .altname    = "t",
-    .cfunc      = truncate_f,
-    .perm       = BLK_PERM_WRITE | BLK_PERM_RESIZE,
-    .argmin     = 1,
-    .argmax     = 1,
-    .args       = "off",
-    .oneline    = "truncates the current file at the given offset",
-};
-
 static int length_f(BlockBackend *blk, int argc, char **argv)
 {
     int64_t size;
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-25  7:10   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 91e4a57126..3d48108f3a 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -197,7 +197,12 @@ def qemu_io_log(*args):
 
 def qemu_io_silent(*args):
     '''Run qemu-io and return the exit code, suppressing stdout'''
-    args = qemu_io_args + list(args)
+    if '-f' in args or '--image-opts' in args:
+        default_args = qemu_io_args_no_fmt
+    else:
+        default_args = qemu_io_args
+
+    args = default_args + list(args)
     exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'))
     if exitcode < 0:
         sys.stderr.write('qemu-io received signal %i: %s\n' %
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-25  8:26   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 12/15] scripts/simplebench: support iops Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/298     | 186 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/298.out |   5 +
 tests/qemu-iotests/group   |   1 +
 3 files changed, 192 insertions(+)
 create mode 100644 tests/qemu-iotests/298
 create mode 100644 tests/qemu-iotests/298.out

diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
new file mode 100644
index 0000000000..fef10f6a7a
--- /dev/null
+++ b/tests/qemu-iotests/298
@@ -0,0 +1,186 @@
+#!/usr/bin/env python3
+#
+# Test for preallocate filter
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+
+MiB = 1024 * 1024
+disk = os.path.join(iotests.test_dir, 'disk')
+overlay = os.path.join(iotests.test_dir, 'overlay')
+refdisk = os.path.join(iotests.test_dir, 'refdisk')
+drive_opts = f'node-name=disk,driver={iotests.imgfmt},' \
+    f'file.node-name=filter,file.driver=preallocate,' \
+    f'file.file.node-name=file,file.file.filename={disk}'
+
+
+class TestPreallocateBase(iotests.QMPTestCase):
+    def setUp(self):
+        iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
+
+    def tearDown(self):
+        try:
+            self.check_small()
+            check = iotests.qemu_img_check(disk)
+            self.assertFalse('leaks' in check)
+            self.assertFalse('corruptions' in check)
+            self.assertEqual(check['check-errors'], 0)
+        finally:
+            os.remove(disk)
+
+    def check_big(self):
+        self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+
+    def check_small(self):
+        self.assertTrue(os.path.getsize(disk) < 10 * MiB)
+
+
+class TestQemuImg(TestPreallocateBase):
+    def test_qemu_img(self):
+        p = iotests.QemuIoInteractive('--image-opts', drive_opts)
+
+        p.cmd('write 0 1M')
+        p.cmd('flush')
+
+        self.check_big()
+
+        p.close()
+
+
+class TestPreallocateFilter(TestPreallocateBase):
+    def setUp(self):
+        super().setUp()
+        self.vm = iotests.VM().add_drive(path=None, opts=drive_opts)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        super().tearDown()
+
+    def test_prealloc(self):
+        self.vm.hmp_qemu_io('drive0', 'write 0 1M')
+        self.check_big()
+
+    def test_external_snapshot(self):
+        self.test_prealloc()
+
+        result = self.vm.qmp('blockdev-snapshot-sync', node_name='disk',
+                             snapshot_file=overlay,
+                             snapshot_node_name='overlay')
+        self.assert_qmp(result, 'return', {})
+
+        # on reopen to  r-o base preallocation should be dropped
+        self.check_small()
+
+        self.vm.hmp_qemu_io('drive0', 'write 1M 1M')
+
+        result = self.vm.qmp('block-commit', device='overlay')
+        self.assert_qmp(result, 'return', {})
+        self.complete_and_wait()
+
+        # commit of new megabyte should trigger preallocation
+        self.check_big()
+
+    def test_reopen_opts(self):
+        result = self.vm.qmp('x-blockdev-reopen', **{
+            'node-name': 'disk',
+            'driver': iotests.imgfmt,
+            'file': {
+                'node-name': 'filter',
+                'driver': 'preallocate',
+                'prealloc-size': 20 * MiB,
+                'prealloc-align': 5 * MiB,
+                'file': {
+                    'node-name': 'file',
+                    'driver': 'file',
+                    'filename': disk
+                }
+            }
+        })
+        self.assert_qmp(result, 'return', {})
+
+        self.vm.hmp_qemu_io('drive0', 'write 0 1M')
+        self.assertTrue(os.path.getsize(disk) == 25 * MiB)
+
+
+class TestTruncate(iotests.QMPTestCase):
+    def setUp(self):
+        iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
+        iotests.qemu_img_create('-f', iotests.imgfmt, refdisk, str(10 * MiB))
+
+    def tearDown(self):
+        os.remove(disk)
+        os.remove(refdisk)
+
+    def do_test(self, prealloc_mode, new_size):
+        ret = iotests.qemu_io_silent('--image-opts', '-c', 'write 0 10M', '-c',
+                                     f'truncate -m {prealloc_mode} {new_size}',
+                                     drive_opts)
+        self.assertEqual(ret, 0)
+
+        ret = iotests.qemu_io_silent('-f', iotests.imgfmt, '-c', 'write 0 10M',
+                                     '-c',
+                                     f'truncate -m {prealloc_mode} {new_size}',
+                                     refdisk)
+        self.assertEqual(ret, 0)
+
+        stat = os.stat(disk)
+        refstat = os.stat(refdisk)
+
+        # Probably we'll want preallocate filter to keep align to cluster when
+        # shrink preallocation, so, ignore small differece
+        self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
+
+        # Preallocate filter may leak some internal clusters (for example, if
+        # guest write far over EOF, skipping some clusters - they will remain
+        # fallocated, preallocate filter don't care about such leaks, it drops
+        # only trailing preallocation.
+        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
+                        1024 * 1024)
+
+    def test_real_shrink(self):
+        self.do_test('off', '5M')
+
+    def test_truncate_inside_preallocated_area__falloc(self):
+        self.do_test('falloc', '50M')
+
+    def test_truncate_inside_preallocated_area__metadata(self):
+        self.do_test('metadata', '50M')
+
+    def test_truncate_inside_preallocated_area__full(self):
+        self.do_test('full', '50M')
+
+    def test_truncate_inside_preallocated_area__off(self):
+        self.do_test('off', '50M')
+
+    def test_truncate_over_preallocated_area__falloc(self):
+        self.do_test('falloc', '150M')
+
+    def test_truncate_over_preallocated_area__metadata(self):
+        self.do_test('metadata', '150M')
+
+    def test_truncate_over_preallocated_area__full(self):
+        self.do_test('full', '150M')
+
+    def test_truncate_over_preallocated_area__off(self):
+        self.do_test('off', '150M')
+
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qcow2'])
diff --git a/tests/qemu-iotests/298.out b/tests/qemu-iotests/298.out
new file mode 100644
index 0000000000..fa16b5ccef
--- /dev/null
+++ b/tests/qemu-iotests/298.out
@@ -0,0 +1,5 @@
+.............
+----------------------------------------------------------------------
+Ran 13 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ff59cfd2d4..15d5f9619b 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -307,6 +307,7 @@
 295 rw
 296 rw
 297 meta
+298 auto quick
 299 auto quick
 300 migration
 301 backing quick
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 12/15] scripts/simplebench: support iops
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-25  8:54   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 13/15] scripts/simplebench: improve view of ascii table Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Support benchmarks returning not seconds but iops. We'll use it for
further new test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 scripts/simplebench/simplebench.py | 35 +++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
index 59e7314ff6..716d7fe9b2 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -24,9 +24,12 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
 
     test_func   -- benchmarking function with prototype
                    test_func(env, case), which takes test_env and test_case
-                   arguments and returns {'seconds': int} (which is benchmark
-                   result) on success and {'error': str} on error. Returned
-                   dict may contain any other additional fields.
+                   arguments and on success returns dict with 'seconds' or
+                   'iops' (or both) fields, specifying the benchmark result.
+                   If both 'iops' and 'seconds' provided, the 'iops' is
+                   considered the main, and 'seconds' is just an additional
+                   info. On failure test_func should return {'error': str}.
+                   Returned dict may contain any other additional fields.
     test_env    -- test environment - opaque first argument for test_func
     test_case   -- test case - opaque second argument for test_func
     count       -- how many times to call test_func, to calculate average
@@ -34,6 +37,7 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
 
     Returns dict with the following fields:
         'runs':     list of test_func results
+        'dimension': dimension of results, may be 'seconds' or 'iops'
         'average':  average seconds per run (exists only if at least one run
                     succeeded)
         'delta':    maximum delta between test_func result and the average
@@ -54,11 +58,20 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
 
     result = {'runs': runs}
 
-    successed = [r for r in runs if ('seconds' in r)]
+    successed = [r for r in runs if ('seconds' in r or 'iops' in r)]
     if successed:
-        avg = sum(r['seconds'] for r in successed) / len(successed)
+        dim = 'iops' if ('iops' in successed[0]) else 'seconds'
+        if 'iops' in successed[0]:
+            assert all('iops' in r for r in successed)
+            dim = 'iops'
+        else:
+            assert all('seconds' in r for r in successed)
+            assert all('iops' not in r for r in successed)
+            dim = 'seconds'
+        avg = sum(r[dim] for r in successed) / len(successed)
+        result['dimension'] = dim
         result['average'] = avg
-        result['delta'] = max(abs(r['seconds'] - avg) for r in successed)
+        result['delta'] = max(abs(r[dim] - avg) for r in successed)
 
     if len(successed) < count:
         result['n-failed'] = count - len(successed)
@@ -118,11 +131,17 @@ def ascii(results):
     """Return ASCII representation of bench() returned dict."""
     from tabulate import tabulate
 
+    dim = None
     tab = [[""] + [c['id'] for c in results['envs']]]
     for case in results['cases']:
         row = [case['id']]
         for env in results['envs']:
-            row.append(ascii_one(results['tab'][case['id']][env['id']]))
+            res = results['tab'][case['id']][env['id']]
+            if dim is None:
+                dim = res['dimension']
+            else:
+                assert dim == res['dimension']
+            row.append(ascii_one(res))
         tab.append(row)
 
-    return tabulate(tab)
+    return f'All results are in {dim}\n\n' + tabulate(tab)
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 13/15] scripts/simplebench: improve view of ascii table
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 12/15] scripts/simplebench: support iops Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-25  9:31   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line Vladimir Sementsov-Ogievskiy
  2020-09-18 18:19 ` [PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py Vladimir Sementsov-Ogievskiy
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Introduce dynamic float precision and use percentage to show delta.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 scripts/simplebench/simplebench.py | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
index 716d7fe9b2..56d3a91ea2 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -79,10 +79,34 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
     return result
 
 
+def format_float(x):
+    res = round(x)
+    if res >= 100:
+        return str(res)
+
+    res = f'{x:.1f}'
+    if len(res) >= 4:
+        return res
+
+    return f'{x:.2f}'
+
+
+def format_percent(x):
+    x *= 100
+
+    res = round(x)
+    if res >= 10:
+        return str(res)
+
+    return f'{x:.1f}' if res >= 1 else f'{x:.2f}'
+
+
 def ascii_one(result):
     """Return ASCII representation of bench_one() returned dict."""
     if 'average' in result:
-        s = '{:.2f} +- {:.2f}'.format(result['average'], result['delta'])
+        avg = result['average']
+        delta_pr = result['delta'] / avg
+        s = f'{format_float(avg)}±{format_percent(delta_pr)}%'
         if 'n-failed' in result:
             s += '\n({} failed)'.format(result['n-failed'])
         return s
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 13/15] scripts/simplebench: improve view of ascii table Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-25 10:24   ` Max Reitz
  2020-09-18 18:19 ` [PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py Vladimir Sementsov-Ogievskiy
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Performance improvements / degradations are usually discussed in
percentage. Let's make the script calculate it for us.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
index 56d3a91ea2..0ff05a38b8 100644
--- a/scripts/simplebench/simplebench.py
+++ b/scripts/simplebench/simplebench.py
@@ -153,14 +153,22 @@ def bench(test_func, test_envs, test_cases, *args, **vargs):
 
 def ascii(results):
     """Return ASCII representation of bench() returned dict."""
-    from tabulate import tabulate
+    import tabulate
+
+    # We want leading whitespace for difference row cells (see below)
+    tabulate.PRESERVE_WHITESPACE = True
 
     dim = None
-    tab = [[""] + [c['id'] for c in results['envs']]]
+    tab = [
+        # Environment columns are named A, B, ...
+        [""] + [chr(ord('A') + i) for i in range(len(results['envs']))],
+        [""] + [c['id'] for c in results['envs']]
+    ]
     for case in results['cases']:
         row = [case['id']]
+        case_results = results['tab'][case['id']]
         for env in results['envs']:
-            res = results['tab'][case['id']][env['id']]
+            res = case_results[env['id']]
             if dim is None:
                 dim = res['dimension']
             else:
@@ -168,4 +176,34 @@ def ascii(results):
             row.append(ascii_one(res))
         tab.append(row)
 
-    return f'All results are in {dim}\n\n' + tabulate(tab)
+        # Add row of difference between column. For each column starting from
+        # B we calculate difference with all previous columns.
+        row = ['', '']  # case name and first column
+        for i in range(1, len(results['envs'])):
+            cell = ''
+            env = results['envs'][i]
+            res = case_results[env['id']]
+
+            if 'average' not in res:
+                # Failed result
+                row.append(cell)
+                continue
+
+            for j in range(0, i):
+                env_j = results['envs'][j]
+                res_j = case_results[env_j['id']]
+
+                if 'average' not in res_j:
+                    # Failed result
+                    cell += ' --'
+                    continue
+
+                col_j = chr(ord('A') + j)
+                avg_j = res_j['average']
+                delta = (res['average'] - avg_j) / avg_j * 100
+                delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100
+                cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'
+            row.append(cell)
+        tab.append(row)
+
+    return f'All results are in {dim}\n\n' + tabulate.tabulate(tab)
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py
  2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2020-09-18 18:19 ` [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line Vladimir Sementsov-Ogievskiy
@ 2020-09-18 18:19 ` Vladimir Sementsov-Ogievskiy
  2020-09-25 11:25   ` Max Reitz
  14 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-18 18:19 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, vsementsov, armbru, eblake, fam, stefanha, mreitz,
	kwolf, den

Benchmark for new preallocate filter.

Example usage:
    ./bench_prealloc.py ../../build/qemu-img \
        ssd-ext4:/path/to/mount/point \
        ssd-xfs:/path2 hdd-ext4:/path3 hdd-xfs:/path4

The benchmark shows performance improvement (or degradation) when use
new preallocate filter with qcow2 image.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 scripts/simplebench/bench_prealloc.py | 128 ++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)
 create mode 100755 scripts/simplebench/bench_prealloc.py

diff --git a/scripts/simplebench/bench_prealloc.py b/scripts/simplebench/bench_prealloc.py
new file mode 100755
index 0000000000..fda4b3410e
--- /dev/null
+++ b/scripts/simplebench/bench_prealloc.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+#
+# Benchmark preallocate filter
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+import sys
+import os
+import subprocess
+import re
+
+import simplebench
+
+
+def qemu_img_bench(args):
+    p = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
+                       universal_newlines=True)
+
+    if p.returncode == 0:
+        try:
+            m = re.search(r'Run completed in (\d+.\d+) seconds.', p.stdout)
+            return {'seconds': float(m.group(1))}
+        except Exception:
+            return {'error': f'failed to parse qemu-img output: {p.stdout}'}
+    else:
+        return {'error': f'qemu-img failed: {p.returncode}: {p.stdout}'}
+
+
+def bench_func(env, case):
+    fname = f"{case['dir']}/prealloc-test.qcow2"
+    try:
+        os.remove(fname)
+    except OSError:
+        pass
+
+    subprocess.run([env['qemu-img-binary'], 'create', '-f', 'qcow2', fname,
+                   '16G'], stdout=subprocess.DEVNULL,
+                   stderr=subprocess.DEVNULL, check=True)
+
+    args = [env['qemu-img-binary'], 'bench', '-c', str(case['count']),
+            '-d', '64', '-s', case['block-size'], '-t', 'none', '-n', '-w']
+    if env['prealloc']:
+        args += ['--image-opts',
+                 'driver=qcow2,file.driver=preallocate,file.file.driver=file,'
+                 f'file.file.filename={fname}']
+    else:
+        args += ['-f', 'qcow2', fname]
+
+    return qemu_img_bench(args)
+
+
+def auto_count_bench_func(env, case):
+    case['count'] = 100
+    while True:
+        res = bench_func(env, case)
+        if 'error' in res:
+            return res
+
+        if res['seconds'] >= 1:
+            break
+
+        case['count'] *= 10
+
+    if res['seconds'] < 5:
+        case['count'] = round(case['count'] * 5 / res['seconds'])
+        res = bench_func(env, case)
+        if 'error' in res:
+            return res
+
+    res['iops'] = case['count'] / res['seconds']
+    return res
+
+
+if __name__ == '__main__':
+    if len(sys.argv) < 2:
+        print(f'USAGE: {sys.argv[0]} <qemu-img binary> '
+              'DISK_NAME:DIR_PATH ...')
+        exit(1)
+
+    qemu_img = sys.argv[1]
+
+    envs = [
+        {
+            'id': 'no-prealloc',
+            'qemu-img-binary': qemu_img,
+            'prealloc': False
+        },
+        {
+            'id': 'prealloc',
+            'qemu-img-binary': qemu_img,
+            'prealloc': True
+        }
+    ]
+
+    aligned_cases = []
+    unaligned_cases = []
+
+    for disk in sys.argv[2:]:
+        name, path = disk.split(':')
+        aligned_cases.append({
+            'id': f'{name}, aligned sequential 16k',
+            'block-size': '16k',
+            'dir': path
+        })
+        unaligned_cases.append({
+            'id': f'{name}, unaligned sequential 64k',
+            'block-size': '16k',
+            'dir': path
+        })
+
+    result = simplebench.bench(auto_count_bench_func, envs,
+                               aligned_cases + unaligned_cases, count=5)
+    print(simplebench.ascii(result))
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway
  2020-09-18 18:19 ` [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway Vladimir Sementsov-Ogievskiy
@ 2020-09-24 14:25   ` Max Reitz
  2020-09-24 14:55     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2020-09-24 14:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 2177 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Do generic processing even for drivers which define .bdrv_check_perm
> handler. It's needed for further preallocate filter: it will need to do
> additional action on bdrv_check_perm, but don't want to reimplement
> generic logic.
> 
> The patch doesn't change existing behaviour: the only driver that
> implements bdrv_check_perm is file-posix, but it never has any
> children.
> 
> Also, bdrv_set_perm() don't stop processing if driver has
> .bdrv_set_perm handler as well.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 9538af4884..165c2d3cb2 100644
> --- a/block.c
> +++ b/block.c
> @@ -1964,8 +1964,7 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
>  /*
>   * Check whether permissions on this node can be changed in a way that
>   * @cumulative_perms and @cumulative_shared_perms are the new cumulative
> - * permissions of all its parents. This involves checking whether all necessary
> - * permission changes to child nodes can be performed.
> + * permissions of all its parents.

Why do you want to remove this sentence?

>   *
>   * Will set *tighten_restrictions to true if and only if new permissions have to
>   * be taken or currently shared permissions are to be unshared.  Otherwise,
> @@ -2047,8 +2046,11 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>      }
>  
>      if (drv->bdrv_check_perm) {
> -        return drv->bdrv_check_perm(bs, cumulative_perms,
> -                                    cumulative_shared_perms, errp);
> +        ret = drv->bdrv_check_perm(bs, cumulative_perms,
> +                                   cumulative_shared_perms, errp);
> +        if (ret < 0) {
> +            return ret;
> +        }
>      }

Sounds good.  It’s also consistent with how bdrv_abort_perm_update() and
bdrv_set_perm() don’t return after calling the respective driver
functions, but always recurse to the children.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway
  2020-09-24 14:25   ` Max Reitz
@ 2020-09-24 14:55     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-24 14:55 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, armbru, eblake, fam, stefanha, kwolf, den

24.09.2020 17:25, Max Reitz wrote:
> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>> Do generic processing even for drivers which define .bdrv_check_perm
>> handler. It's needed for further preallocate filter: it will need to do
>> additional action on bdrv_check_perm, but don't want to reimplement
>> generic logic.
>>
>> The patch doesn't change existing behaviour: the only driver that
>> implements bdrv_check_perm is file-posix, but it never has any
>> children.
>>
>> Also, bdrv_set_perm() don't stop processing if driver has
>> .bdrv_set_perm handler as well.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block.c | 10 ++++++----
>>   1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 9538af4884..165c2d3cb2 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1964,8 +1964,7 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
>>   /*
>>    * Check whether permissions on this node can be changed in a way that
>>    * @cumulative_perms and @cumulative_shared_perms are the new cumulative
>> - * permissions of all its parents. This involves checking whether all necessary
>> - * permission changes to child nodes can be performed.
>> + * permissions of all its parents.
> 
> Why do you want to remove this sentence?

Really strange :) I don't know. I remember that I've modified some comment working on this series, and it was important... But this sentence become even more obviously correct with this patch.

> 
>>    *
>>    * Will set *tighten_restrictions to true if and only if new permissions have to
>>    * be taken or currently shared permissions are to be unshared.  Otherwise,
>> @@ -2047,8 +2046,11 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>       }
>>   
>>       if (drv->bdrv_check_perm) {
>> -        return drv->bdrv_check_perm(bs, cumulative_perms,
>> -                                    cumulative_shared_perms, errp);
>> +        ret = drv->bdrv_check_perm(bs, cumulative_perms,
>> +                                   cumulative_shared_perms, errp);
>> +        if (ret < 0) {
>> +            return ret;
>> +        }
>>       }
> 
> Sounds good.  It’s also consistent with how bdrv_abort_perm_update() and
> bdrv_set_perm() don’t return after calling the respective driver
> functions, but always recurse to the children.
> 
> Max
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 08/15] block: introduce preallocate filter
  2020-09-18 18:19 ` [PATCH v6 08/15] block: introduce preallocate filter Vladimir Sementsov-Ogievskiy
@ 2020-09-24 16:50   ` Max Reitz
  2020-09-24 17:36     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2020-09-24 16:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 11087 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> It's intended to be inserted between format and protocol nodes to
> preallocate additional space (expanding protocol file) on writes
> crossing EOF. It improves performance for file-systems with slow
> allocation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  docs/system/qemu-block-drivers.rst.inc |  26 ++
>  qapi/block-core.json                   |  20 +-
>  block/preallocate.c                    | 556 +++++++++++++++++++++++++
>  block/meson.build                      |   1 +
>  4 files changed, 602 insertions(+), 1 deletion(-)
>  create mode 100644 block/preallocate.c

Looks good to me in general.

[...]

> diff --git a/block/preallocate.c b/block/preallocate.c
> new file mode 100644
> index 0000000000..6510ad0149
> --- /dev/null
> +++ b/block/preallocate.c

[...]

> +/*
> + * Handle reopen.
> + *
> + * We must implement reopen handlers, otherwise reopen just don't work. Handle
> + * new options and don't care about preallocation state, as it is handled in
> + * set/check permission handlers.
> + */
> +
> +static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
> +                                      BlockReopenQueue *queue, Error **errp)
> +{
> +    PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
> +
> +    if (!preallocate_absorb_opts(opts, reopen_state->options,
> +                                 reopen_state->bs->file->bs, errp)) {

I tried to find out whether this refers to the old file child, or the
post-reopen one.  As far as I could find out, there is no generic
implementation for changing the file child as part of x-blockdev-reopen:

{"error": {"class": "GenericError", "desc": "Cannot change the option
'file'"}}

Now that’s a shame.  That means you can’t reasonably integrate a
preallocate filter into an existing node graph unless the format driver
checks for the respective child option and issues a replace_node on
commit or something, right?  I suppose any driver who’d want to
implement child replacement would need to attach the new node in
prepare() as some pseudo-child, and then drop the old one and rename the
new one in commit().  I don’t think any driver does that yet, so I
suppose no format driver allows replacement of children yet (except for
quorum...).

Does anyone know what the status on that is?  Are there any plans for
implementing child replacement in reopen, or did I just miss something?

(It looks like the backing child can be replaced, but that’s probably
not a child where the preallocate filter would be placed on top...).

> +        g_free(opts);
> +        return -EINVAL;
> +    }
> +
> +    reopen_state->opaque = opts;
> +
> +    return 0;
> +}

[...]

> +/*
> + * Call on each write. Returns true if @want_merge_zero is true and the region
> + * [offset, offset + bytes) is zeroed (as a result of this call or earlier
> + * preallocation).
> + *
> + * want_merge_zero is used to merge write-zero request with preallocation in
> + * one bdrv_co_pwrite_zeroes() call.
> + */
> +static bool coroutine_fn handle_write(BlockDriverState *bs, int64_t offset,
> +                                      int64_t bytes, bool want_merge_zero)
> +{
> +    BDRVPreallocateState *s = bs->opaque;
> +    int64_t end = offset + bytes;
> +    int64_t prealloc_start, prealloc_end;
> +    int ret;
> +
> +    if (!has_prealloc_perms(bs)) {

Took me a bit to figure out, because it takes a trip to
preallocate_child_perm() to figure out exactly when we’re going to have
the necessary permissions for this to pass.

Then it turns out that this is going to be the case exactly when the
parents collectively have the same permissions (WRITE+RESIZE) on the
preallocate node.

Then I had to wonder whether it’s appropriate not to preallocate if
WRITE is taken, but RESIZE isn’t.  But that seems entirely correct, yes.
 If noone is going to grow the file, then there is no need for
preallocation.  (Vice versa, if noone is going to write, but only
resize, then there is no need for preallocation either.)

So this seems correct, yes.

(It could be argued that if one parent has WRITE, and another has RESIZE
(but neither has both), then we probably don’t need preallocation
either.  But in such an arcane case (which is impossible to figure out
in .bdrv_child_perm() anyway), we might as well just do preallocation.
Won’t hurt.)

> +        /* We don't have state neither should try to recover it */
> +        return false;
> +    }
> +
> +    if (s->data_end < 0) {
> +        s->data_end = bdrv_getlength(bs->file->bs);
> +        if (s->data_end < 0) {
> +            return false;
> +        }
> +
> +        if (s->file_end < 0) {
> +            s->file_end = s->data_end;
> +        }
> +    }
> +
> +    if (end <= s->data_end) {
> +        return false;
> +    }
> +
> +    /* We have valid s->data_end, and request writes beyond it. */
> +
> +    s->data_end = end;
> +    if (s->zero_start < 0 || !want_merge_zero) {
> +        s->zero_start = end;

Skipping this on want_merge_zero == true means that zero writes can be
cached; if you repeatedly perform zero writes into the preallocated
area, then none of those will actually be executed.  I legitimately
don’t know whether that’s OK.

I suppose until someone tells me it isn’t OK, I’ll believe it is.

> +    }
> +
> +    if (s->file_end < 0) {
> +        s->file_end = bdrv_getlength(bs->file->bs);
> +        if (s->file_end < 0) {
> +            return false;
> +        }
> +    }
> +
> +    /* Now s->data_end, s->zero_start and s->file_end are valid. */
> +
> +    if (end <= s->file_end) {
> +        /* No preallocation needed. */
> +        return want_merge_zero && offset >= s->zero_start;
> +    }
> +
> +    /* Now we want new preallocation, as request writes beyond s->data_end. */

s/data_end/file_end/

> +
> +    prealloc_start = want_merge_zero ? MIN(offset, s->file_end) : s->file_end;

I suppose you intentionally use s->file_end here instead of @end, even
if offset <= s->file_end.  I just mention it because I wonder whether
it’s really better to effectively write twice to the same area in such
cases (once zeroes for preallocation, then immediately the data) instead
of only writing the data and then preallocating past it.

(Though if it were the same code just with @end instead of s->file_end
for offset <= s->file_end, then the order would be to preallocate past
@end, and then to write the data.  Which might be suboptimal in terms of
how the blocks are then ordered in the filesystem.)

> +    prealloc_end = QEMU_ALIGN_UP(offset + bytes + s->opts.prealloc_size,

s/offset + bytes/end/?

> +                                 s->opts.prealloc_align);
> +    s->file_end = end;

Why is this set here, when it’s always overwritten after
bdrv_co_pwrite_zeroes() anyway?

(It also seems a bit wrong, because at this point we don’t know yet
whether the data write is going to succeed, so we don’t know for sure
whether the file end is really going to be @end without the preallocation.)

> +
> +    ret = bdrv_co_pwrite_zeroes(
> +            bs->file, prealloc_start, prealloc_end - prealloc_start,
> +            BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
> +    if (ret < 0) {
> +        s->file_end = ret;
> +        return false;
> +    }
> +
> +    s->file_end = prealloc_end;
> +    return want_merge_zero;
> +}
> +
> +static int coroutine_fn preallocate_co_pwrite_zeroes(BlockDriverState *bs,
> +        int64_t offset, int bytes, BdrvRequestFlags flags)
> +{
> +    bool want_merge_zero =
> +        (flags & (BDRV_REQ_ZERO_WRITE | BDRV_REQ_NO_FALLBACK)) == flags;

Isn’t this the same as !(flags & ~(ZERO_WRITE | NO_FALLBACK))?  (Maybe
only I would find that simpler to understand, though.)

> +    if (handle_write(bs, offset, bytes, want_merge_zero)) {
> +        return 0;
> +    }
> +
> +    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
> +}

[...]

> +static int coroutine_fn
> +preallocate_co_truncate(BlockDriverState *bs, int64_t offset,
> +                        bool exact, PreallocMode prealloc,
> +                        BdrvRequestFlags flags, Error **errp)
> +{
> +    ERRP_GUARD();
> +    BDRVPreallocateState *s = bs->opaque;
> +    int ret;
> +
> +    if (s->data_end >= 0 && offset > s->data_end) {
> +        if (s->file_end < 0) {
> +            s->file_end = bdrv_getlength(bs->file->bs);
> +            if (s->file_end < 0) {
> +                error_setg(errp, "failed to get file length");
> +                return s->file_end;
> +            }
> +        }
> +
> +        if (prealloc == PREALLOC_MODE_FALLOC) {
> +            /*
> +             * If offset <= s->file_end, the task is already done, just
> +             * update s->file_end, to move part of "filter preallocation"

s/file_end/data_end/

> +             * to "preallocation requested by user".
> +             * Otherwise just proceed to preallocate missing part.
> +             */
> +            if (offset <= s->file_end) {
> +                s->data_end = offset;
> +                return 0;
> +            }

[...]

> +static int preallocate_check_perm(BlockDriverState *bs,
> +                                  uint64_t perm, uint64_t shared, Error **errp)
> +{
> +    BDRVPreallocateState *s = bs->opaque;
> +
> +    if (s->data_end >= 0 && !can_write_resize(perm)) {
> +        /*
> +         * Loose permissions.

*Lose

(I assume)

> +         * We should truncate in check_perm, as in set_perm bs->file->perm will
> +         * be already changed, and we should not violate it.
> +         */
> +        if (s->file_end < 0) {
> +            s->file_end = bdrv_getlength(bs->file->bs);
> +            if (s->file_end < 0) {
> +                error_setg(errp, "Failed to get file length");
> +                return s->file_end;
> +            }
> +        }
> +
> +        if (s->data_end < s->file_end) {
> +            int ret = bdrv_truncate(bs->file, s->data_end, true,
> +                                    PREALLOC_MODE_OFF, 0, NULL);
> +            if (ret < 0) {
> +                error_setg(errp, "Failed to drop preallocation");
> +                s->file_end = ret;
> +                return ret;
> +            }
> +        }
> +        /*
> +         * We will drop our permissions, as well as allow shared
> +         * permissions, anyone will be able to change the child, so mark
> +         * all states invalid. We'll regain control if get good permissions
> +         * back.
> +         */
> +        s->data_end = s->file_end = s->zero_start = -EINVAL;

Shouldn’t we clear these variables whenever !can_write_resize(perm), not
just if also s->data_end >= 0?

> +    }
> +
> +    return 0;
> +}

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command
  2020-09-18 18:19 ` [PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command Vladimir Sementsov-Ogievskiy
@ 2020-09-24 17:08   ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2020-09-24 17:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 1222 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> This will be used in further test.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  qemu-io-cmds.c | 46 ++++++++++++++++++++++++++++++++--------------
>  1 file changed, 32 insertions(+), 14 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index baeae86d8c..64f0246a71 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -1698,13 +1698,42 @@ static const cmdinfo_t flush_cmd = {
>      .oneline    = "flush all in-core file state to disk",
>  };
>  
> +static int truncate_f(BlockBackend *blk, int argc, char **argv);
> +static const cmdinfo_t truncate_cmd = {
> +    .name       = "truncate",
> +    .altname    = "t",
> +    .cfunc      = truncate_f,
> +    .perm       = BLK_PERM_WRITE | BLK_PERM_RESIZE,
> +    .argmin     = 1,
> +    .argmax     = 3,
> +    .args       = "[-m prealloc_mode] off",
> +    .oneline    = "truncates the current file at the given offset",
> +};
> +

Forward-declaring truncate_cmd instead of truncate_f would mean a
smaller diffstat.

But that isn’t what other commands do, so.

*shrug*

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 08/15] block: introduce preallocate filter
  2020-09-24 16:50   ` Max Reitz
@ 2020-09-24 17:36     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-24 17:36 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, armbru, eblake, fam, stefanha, kwolf, den

24.09.2020 19:50, Max Reitz wrote:
> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>> It's intended to be inserted between format and protocol nodes to
>> preallocate additional space (expanding protocol file) on writes
>> crossing EOF. It improves performance for file-systems with slow
>> allocation.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   docs/system/qemu-block-drivers.rst.inc |  26 ++
>>   qapi/block-core.json                   |  20 +-
>>   block/preallocate.c                    | 556 +++++++++++++++++++++++++
>>   block/meson.build                      |   1 +
>>   4 files changed, 602 insertions(+), 1 deletion(-)
>>   create mode 100644 block/preallocate.c
> 
> Looks good to me in general.
> 
> [...]
> 
>> diff --git a/block/preallocate.c b/block/preallocate.c
>> new file mode 100644
>> index 0000000000..6510ad0149
>> --- /dev/null
>> +++ b/block/preallocate.c
> 
> [...]
> 
>> +/*
>> + * Handle reopen.
>> + *
>> + * We must implement reopen handlers, otherwise reopen just don't work. Handle
>> + * new options and don't care about preallocation state, as it is handled in
>> + * set/check permission handlers.
>> + */
>> +
>> +static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
>> +                                      BlockReopenQueue *queue, Error **errp)
>> +{
>> +    PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
>> +
>> +    if (!preallocate_absorb_opts(opts, reopen_state->options,
>> +                                 reopen_state->bs->file->bs, errp)) {
> 
> I tried to find out whether this refers to the old file child, or the
> post-reopen one.

Note, that it's needed only to check request_alignment. Probably it's better to pass request_alignment to preallocate_absorb_opts, not the whole child.

>  As far as I could find out, there is no generic
> implementation for changing the file child as part of x-blockdev-reopen:
> 
> {"error": {"class": "GenericError", "desc": "Cannot change the option
> 'file'"}}
> 
> Now that’s a shame.  That means you can’t reasonably integrate a
> preallocate filter into an existing node graph unless the format driver
> checks for the respective child option and issues a replace_node on
> commit or something, right?  I suppose any driver who’d want to
> implement child replacement would need to attach the new node in
> prepare() as some pseudo-child, and then drop the old one and rename the
> new one in commit().  I don’t think any driver does that yet, so I
> suppose no format driver allows replacement of children yet (except for
> quorum...).
> 
> Does anyone know what the status on that is?  Are there any plans for
> implementing child replacement in reopen, or did I just miss something?
> 
> (It looks like the backing child can be replaced, but that’s probably
> not a child where the preallocate filter would be placed on top...).

Hm. I didn't care about it, because main use case is to insert the filter at start, specifying it in -drive or in -blockdev options.

Probably, we need a separate API which will insert/remove filters like it is done in block jobs code, not reopening the block device.

> 
>> +        g_free(opts);
>> +        return -EINVAL;
>> +    }
>> +
>> +    reopen_state->opaque = opts;
>> +
>> +    return 0;
>> +}
> 
> [...]
> 
>> +/*
>> + * Call on each write. Returns true if @want_merge_zero is true and the region
>> + * [offset, offset + bytes) is zeroed (as a result of this call or earlier
>> + * preallocation).
>> + *
>> + * want_merge_zero is used to merge write-zero request with preallocation in
>> + * one bdrv_co_pwrite_zeroes() call.
>> + */
>> +static bool coroutine_fn handle_write(BlockDriverState *bs, int64_t offset,
>> +                                      int64_t bytes, bool want_merge_zero)
>> +{
>> +    BDRVPreallocateState *s = bs->opaque;
>> +    int64_t end = offset + bytes;
>> +    int64_t prealloc_start, prealloc_end;
>> +    int ret;
>> +
>> +    if (!has_prealloc_perms(bs)) {
> 
> Took me a bit to figure out, because it takes a trip to
> preallocate_child_perm() to figure out exactly when we’re going to have
> the necessary permissions for this to pass.
> 
> Then it turns out that this is going to be the case exactly when the
> parents collectively have the same permissions (WRITE+RESIZE) on the
> preallocate node.
> 
> Then I had to wonder whether it’s appropriate not to preallocate if
> WRITE is taken, but RESIZE isn’t.  But that seems entirely correct, yes.
>   If noone is going to grow the file, then there is no need for
> preallocation.  (Vice versa, if noone is going to write, but only
> resize, then there is no need for preallocation either.)
> 
> So this seems correct, yes.
> 
> (It could be argued that if one parent has WRITE, and another has RESIZE
> (but neither has both), then we probably don’t need preallocation
> either.  But in such an arcane case (which is impossible to figure out
> in .bdrv_child_perm() anyway), we might as well just do preallocation.
> Won’t hurt.)
> 
>> +        /* We don't have state neither should try to recover it */
>> +        return false;
>> +    }
>> +
>> +    if (s->data_end < 0) {
>> +        s->data_end = bdrv_getlength(bs->file->bs);
>> +        if (s->data_end < 0) {
>> +            return false;
>> +        }
>> +
>> +        if (s->file_end < 0) {
>> +            s->file_end = s->data_end;
>> +        }
>> +    }
>> +
>> +    if (end <= s->data_end) {
>> +        return false;
>> +    }
>> +
>> +    /* We have valid s->data_end, and request writes beyond it. */
>> +
>> +    s->data_end = end;
>> +    if (s->zero_start < 0 || !want_merge_zero) {
>> +        s->zero_start = end;
> 
> Skipping this on want_merge_zero == true means that zero writes can be
> cached; if you repeatedly perform zero writes into the preallocated
> area, then none of those will actually be executed.  I legitimately
> don’t know whether that’s OK.
> 
> I suppose until someone tells me it isn’t OK, I’ll believe it is.

Skipping zero-writes to preallocated area is significant for performance.
If I remember correctly, handle_alloc_space() in qcow2 will otherwise do extra write-zeroes, which reduces performance.

So, we assume that zero-write after EOF is a kind of preallocation. And avoid preallocating same region twice.

Using zero_start instead of data_end is probably less significant, I'm not sure.

> 
>> +    }
>> +
>> +    if (s->file_end < 0) {
>> +        s->file_end = bdrv_getlength(bs->file->bs);
>> +        if (s->file_end < 0) {
>> +            return false;
>> +        }
>> +    }
>> +
>> +    /* Now s->data_end, s->zero_start and s->file_end are valid. */
>> +
>> +    if (end <= s->file_end) {
>> +        /* No preallocation needed. */
>> +        return want_merge_zero && offset >= s->zero_start;
>> +    }
>> +
>> +    /* Now we want new preallocation, as request writes beyond s->data_end. */
> 
> s/data_end/file_end/> 
>> +
>> +    prealloc_start = want_merge_zero ? MIN(offset, s->file_end) : s->file_end;
> 
> I suppose you intentionally use s->file_end here instead of @end, even
> if offset <= s->file_end.  I just mention it because I wonder whether
> it’s really better to effectively write twice to the same area in such
> cases (once zeroes for preallocation, then immediately the data) instead
> of only writing the data and then preallocating past it.
> 
> (Though if it were the same code just with @end instead of s->file_end
> for offset <= s->file_end, then the order would be to preallocate past
> @end, and then to write the data.  Which might be suboptimal in terms of
> how the blocks are then ordered in the filesystem.)

Yes, I think it's better to preallocate from file_end, in this way we do our best for good file blocks location in filesystem. And writing twice shouldn't really matter with big enough preallocation size (a lot larger than usual guest write operation).

> 
>> +    prealloc_end = QEMU_ALIGN_UP(offset + bytes + s->opts.prealloc_size,
> 
> s/offset + bytes/end/?

yes

> 
>> +                                 s->opts.prealloc_align);
>> +    s->file_end = end;
> 
> Why is this set here, when it’s always overwritten after
> bdrv_co_pwrite_zeroes() anyway?
> 
> (It also seems a bit wrong, because at this point we don’t know yet
> whether the data write is going to succeed, so we don’t know for sure
> whether the file end is really going to be @end without the preallocation.)

Agree, that's strange and seems wrong. Will drop.

> 
>> +
>> +    ret = bdrv_co_pwrite_zeroes(
>> +            bs->file, prealloc_start, prealloc_end - prealloc_start,
>> +            BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
>> +    if (ret < 0) {
>> +        s->file_end = ret;
>> +        return false;
>> +    }
>> +
>> +    s->file_end = prealloc_end;
>> +    return want_merge_zero;
>> +}
>> +
>> +static int coroutine_fn preallocate_co_pwrite_zeroes(BlockDriverState *bs,
>> +        int64_t offset, int bytes, BdrvRequestFlags flags)
>> +{
>> +    bool want_merge_zero =
>> +        (flags & (BDRV_REQ_ZERO_WRITE | BDRV_REQ_NO_FALLBACK)) == flags;
> 
> Isn’t this the same as !(flags & ~(ZERO_WRITE | NO_FALLBACK))?  (Maybe
> only I would find that simpler to understand, though.)

Actually, I think yours is more usual notation and it uses "flags" once. Will change.

> 
>> +    if (handle_write(bs, offset, bytes, want_merge_zero)) {
>> +        return 0;
>> +    }
>> +
>> +    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
>> +}
> 
> [...]
> 
>> +static int coroutine_fn
>> +preallocate_co_truncate(BlockDriverState *bs, int64_t offset,
>> +                        bool exact, PreallocMode prealloc,
>> +                        BdrvRequestFlags flags, Error **errp)
>> +{
>> +    ERRP_GUARD();
>> +    BDRVPreallocateState *s = bs->opaque;
>> +    int ret;
>> +
>> +    if (s->data_end >= 0 && offset > s->data_end) {
>> +        if (s->file_end < 0) {
>> +            s->file_end = bdrv_getlength(bs->file->bs);
>> +            if (s->file_end < 0) {
>> +                error_setg(errp, "failed to get file length");
>> +                return s->file_end;
>> +            }
>> +        }
>> +
>> +        if (prealloc == PREALLOC_MODE_FALLOC) {
>> +            /*
>> +             * If offset <= s->file_end, the task is already done, just
>> +             * update s->file_end, to move part of "filter preallocation"
> 
> s/file_end/data_end/
> 
>> +             * to "preallocation requested by user".
>> +             * Otherwise just proceed to preallocate missing part.
>> +             */
>> +            if (offset <= s->file_end) {
>> +                s->data_end = offset;
>> +                return 0;
>> +            }
> 
> [...]
> 
>> +static int preallocate_check_perm(BlockDriverState *bs,
>> +                                  uint64_t perm, uint64_t shared, Error **errp)
>> +{
>> +    BDRVPreallocateState *s = bs->opaque;
>> +
>> +    if (s->data_end >= 0 && !can_write_resize(perm)) {
>> +        /*
>> +         * Loose permissions.
> 
> *Lose
> 
> (I assume)

Oops)

> 
>> +         * We should truncate in check_perm, as in set_perm bs->file->perm will
>> +         * be already changed, and we should not violate it.
>> +         */
>> +        if (s->file_end < 0) {
>> +            s->file_end = bdrv_getlength(bs->file->bs);
>> +            if (s->file_end < 0) {
>> +                error_setg(errp, "Failed to get file length");
>> +                return s->file_end;
>> +            }
>> +        }
>> +
>> +        if (s->data_end < s->file_end) {
>> +            int ret = bdrv_truncate(bs->file, s->data_end, true,
>> +                                    PREALLOC_MODE_OFF, 0, NULL);
>> +            if (ret < 0) {
>> +                error_setg(errp, "Failed to drop preallocation");
>> +                s->file_end = ret;
>> +                return ret;
>> +            }
>> +        }
>> +        /*
>> +         * We will drop our permissions, as well as allow shared
>> +         * permissions, anyone will be able to change the child, so mark
>> +         * all states invalid. We'll regain control if get good permissions
>> +         * back.
>> +         */
>> +        s->data_end = s->file_end = s->zero_start = -EINVAL;
> 
> Shouldn’t we clear these variables whenever !can_write_resize(perm), not
> just if also s->data_end >= 0?

Yes, will fix.

> 
>> +    }
>> +
>> +    return 0;
>> +}
> 
> Max
> 

Thanks!

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts
  2020-09-18 18:19 ` [PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts Vladimir Sementsov-Ogievskiy
@ 2020-09-25  7:10   ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2020-09-25  7:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 282 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/iotests.py | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-18 18:19 ` [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver Vladimir Sementsov-Ogievskiy
@ 2020-09-25  8:26   ` Max Reitz
  2020-09-25  8:49     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2020-09-25  8:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 3521 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/298     | 186 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/298.out |   5 +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 192 insertions(+)
>  create mode 100644 tests/qemu-iotests/298
>  create mode 100644 tests/qemu-iotests/298.out
> 
> diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
> new file mode 100644
> index 0000000000..fef10f6a7a
> --- /dev/null
> +++ b/tests/qemu-iotests/298

[...]

> +class TestPreallocateBase(iotests.QMPTestCase):

Perhaps a

@iotests.skip_if_unsupported(['preallocate'])

here?

> +    def setUp(self):
> +        iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))

[...]

> +class TestTruncate(iotests.QMPTestCase):

The same decorator could be placed here, although this class doesn’t
start a VM, and so is unaffected by the allowlist.  Still may be
relevant in case of block modules, I don’t know.

> +    def setUp(self):
> +        iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
> +        iotests.qemu_img_create('-f', iotests.imgfmt, refdisk, str(10 * MiB))
> +
> +    def tearDown(self):
> +        os.remove(disk)
> +        os.remove(refdisk)
> +
> +    def do_test(self, prealloc_mode, new_size):
> +        ret = iotests.qemu_io_silent('--image-opts', '-c', 'write 0 10M', '-c',
> +                                     f'truncate -m {prealloc_mode} {new_size}',
> +                                     drive_opts)
> +        self.assertEqual(ret, 0)
> +
> +        ret = iotests.qemu_io_silent('-f', iotests.imgfmt, '-c', 'write 0 10M',
> +                                     '-c',
> +                                     f'truncate -m {prealloc_mode} {new_size}',
> +                                     refdisk)
> +        self.assertEqual(ret, 0)
> +
> +        stat = os.stat(disk)
> +        refstat = os.stat(refdisk)
> +
> +        # Probably we'll want preallocate filter to keep align to cluster when
> +        # shrink preallocation, so, ignore small differece
> +        self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
> +
> +        # Preallocate filter may leak some internal clusters (for example, if
> +        # guest write far over EOF, skipping some clusters - they will remain
> +        # fallocated, preallocate filter don't care about such leaks, it drops
> +        # only trailing preallocation.

True, but that isn’t what’s happening here.  (We only write 10M at 0, so
there are no gaps.)  Why do we need this 1M margin?

> +        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
> +                        1024 * 1024)

[...]

> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
> index ff59cfd2d4..15d5f9619b 100644
> --- a/tests/qemu-iotests/group
> +++ b/tests/qemu-iotests/group
> @@ -307,6 +307,7 @@
>  295 rw
>  296 rw
>  297 meta
> +298 auto quick

I wouldn’t mark it as quick, there is at least one preallocate=full of
140M, and one of 40M, plus multiple 10M data writes and falloc
preallocations.

Also, since you mark it as “auto”, have you run this test on all
CI-relevant hosts?  (Among other things I can’t predict) I wonder how
preallocation behaves on macOS.  Just because that one was always a bit
weird about not-really-data areas.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-25  8:26   ` Max Reitz
@ 2020-09-25  8:49     ` Vladimir Sementsov-Ogievskiy
  2020-09-25  9:11       ` Max Reitz
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-25  8:49 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, armbru, eblake, fam, stefanha, kwolf, den

25.09.2020 11:26, Max Reitz wrote:
> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/298     | 186 +++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/298.out |   5 +
>>   tests/qemu-iotests/group   |   1 +
>>   3 files changed, 192 insertions(+)
>>   create mode 100644 tests/qemu-iotests/298
>>   create mode 100644 tests/qemu-iotests/298.out
>>
>> diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298
>> new file mode 100644
>> index 0000000000..fef10f6a7a
>> --- /dev/null
>> +++ b/tests/qemu-iotests/298
> 
> [...]
> 
>> +class TestPreallocateBase(iotests.QMPTestCase):
> 
> Perhaps a
> 
> @iotests.skip_if_unsupported(['preallocate'])
> 
> here?
> 
>> +    def setUp(self):
>> +        iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
> 
> [...]
> 
>> +class TestTruncate(iotests.QMPTestCase):
> 
> The same decorator could be placed here, although this class doesn’t
> start a VM, and so is unaffected by the allowlist.  Still may be
> relevant in case of block modules, I don’t know.

Or just global test skip at file top

> 
>> +    def setUp(self):
>> +        iotests.qemu_img_create('-f', iotests.imgfmt, disk, str(10 * MiB))
>> +        iotests.qemu_img_create('-f', iotests.imgfmt, refdisk, str(10 * MiB))
>> +
>> +    def tearDown(self):
>> +        os.remove(disk)
>> +        os.remove(refdisk)
>> +
>> +    def do_test(self, prealloc_mode, new_size):
>> +        ret = iotests.qemu_io_silent('--image-opts', '-c', 'write 0 10M', '-c',
>> +                                     f'truncate -m {prealloc_mode} {new_size}',
>> +                                     drive_opts)
>> +        self.assertEqual(ret, 0)
>> +
>> +        ret = iotests.qemu_io_silent('-f', iotests.imgfmt, '-c', 'write 0 10M',
>> +                                     '-c',
>> +                                     f'truncate -m {prealloc_mode} {new_size}',
>> +                                     refdisk)
>> +        self.assertEqual(ret, 0)
>> +
>> +        stat = os.stat(disk)
>> +        refstat = os.stat(refdisk)
>> +
>> +        # Probably we'll want preallocate filter to keep align to cluster when
>> +        # shrink preallocation, so, ignore small differece
>> +        self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
>> +
>> +        # Preallocate filter may leak some internal clusters (for example, if
>> +        # guest write far over EOF, skipping some clusters - they will remain
>> +        # fallocated, preallocate filter don't care about such leaks, it drops
>> +        # only trailing preallocation.
> 
> True, but that isn’t what’s happening here.  (We only write 10M at 0, so
> there are no gaps.)  Why do we need this 1M margin?

We write 10M, but qcow2 also writes metadata as it wants

> 
>> +        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
>> +                        1024 * 1024)
> 
> [...]
> 
>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>> index ff59cfd2d4..15d5f9619b 100644
>> --- a/tests/qemu-iotests/group
>> +++ b/tests/qemu-iotests/group
>> @@ -307,6 +307,7 @@
>>   295 rw
>>   296 rw
>>   297 meta
>> +298 auto quick
> 
> I wouldn’t mark it as quick, there is at least one preallocate=full of
> 140M, and one of 40M, plus multiple 10M data writes and falloc
> preallocations.
> 
> Also, since you mark it as “auto”, have you run this test on all
> CI-relevant hosts?  (Among other things I can’t predict) I wonder how
> preallocation behaves on macOS.  Just because that one was always a bit
> weird about not-really-data areas.
> 

Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this.. If I don't put new test in "auto", is there any chance that test would be automatically run somewhere?


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 12/15] scripts/simplebench: support iops
  2020-09-18 18:19 ` [PATCH v6 12/15] scripts/simplebench: support iops Vladimir Sementsov-Ogievskiy
@ 2020-09-25  8:54   ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2020-09-25  8:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 2157 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Support benchmarks returning not seconds but iops. We'll use it for
> further new test.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  scripts/simplebench/simplebench.py | 35 +++++++++++++++++++++++-------
>  1 file changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
> index 59e7314ff6..716d7fe9b2 100644
> --- a/scripts/simplebench/simplebench.py
> +++ b/scripts/simplebench/simplebench.py

[...]

> @@ -34,6 +37,7 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
>  
>      Returns dict with the following fields:
>          'runs':     list of test_func results
> +        'dimension': dimension of results, may be 'seconds' or 'iops'
>          'average':  average seconds per run (exists only if at least one run

s/seconds/value/ (or something like that)

>                      succeeded)
>          'delta':    maximum delta between test_func result and the average
> @@ -54,11 +58,20 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
>  
>      result = {'runs': runs}
>  
> -    successed = [r for r in runs if ('seconds' in r)]
> +    successed = [r for r in runs if ('seconds' in r or 'iops' in r)]
>      if successed:

((Pre-existing, but I feel the urge to point it that it should be
“succeeded”.  (Or perhaps “successes”.)

Sorry, not something that should be fixed here, but I just couldn’t
contain myself.))

> -        avg = sum(r['seconds'] for r in successed) / len(successed)
> +        dim = 'iops' if ('iops' in successed[0]) else 'seconds'

I think this line should be dropped, because it’s obsoleted by the
if-else that follows.

> +        if 'iops' in successed[0]:
> +            assert all('iops' in r for r in successed)
> +            dim = 'iops'
> +        else:
> +            assert all('seconds' in r for r in successed)
> +            assert all('iops' not in r for r in successed)
> +            dim = 'seconds'

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-25  8:49     ` Vladimir Sementsov-Ogievskiy
@ 2020-09-25  9:11       ` Max Reitz
  2020-09-25 15:11         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2020-09-25  9:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 7071 bytes --]

On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 11:26, Max Reitz wrote:
>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   tests/qemu-iotests/298     | 186 +++++++++++++++++++++++++++++++++++++
>>>   tests/qemu-iotests/298.out |   5 +
>>>   tests/qemu-iotests/group   |   1 +
>>>   3 files changed, 192 insertions(+)
>>>   create mode 100644 tests/qemu-iotests/298
>>>   create mode 100644 tests/qemu-iotests/298.out

[...]

>>> +class TestTruncate(iotests.QMPTestCase):
>>
>> The same decorator could be placed here, although this class doesn’t
>> start a VM, and so is unaffected by the allowlist.  Still may be
>> relevant in case of block modules, I don’t know.
> 
> Or just global test skip at file top

Hm.  Like verify_quorum()?  Is there a generic function for that already?

[...]

>>> +        # Probably we'll want preallocate filter to keep align to
>>> cluster when
>>> +        # shrink preallocation, so, ignore small differece
>>> +        self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
>>> +
>>> +        # Preallocate filter may leak some internal clusters (for
>>> example, if
>>> +        # guest write far over EOF, skipping some clusters - they
>>> will remain
>>> +        # fallocated, preallocate filter don't care about such
>>> leaks, it drops
>>> +        # only trailing preallocation.
>>
>> True, but that isn’t what’s happening here.  (We only write 10M at 0, so
>> there are no gaps.)  Why do we need this 1M margin?
> 
> We write 10M, but qcow2 also writes metadata as it wants

Ah, yes, sure.  Shouldn’t result in 1M, but why not.

>>> +        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
>>> +                        1024 * 1024)
>>
>> [...]
>>
>>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>>> index ff59cfd2d4..15d5f9619b 100644
>>> --- a/tests/qemu-iotests/group
>>> +++ b/tests/qemu-iotests/group
>>> @@ -307,6 +307,7 @@
>>>   295 rw
>>>   296 rw
>>>   297 meta
>>> +298 auto quick
>>
>> I wouldn’t mark it as quick, there is at least one preallocate=full of
>> 140M, and one of 40M, plus multiple 10M data writes and falloc
>> preallocations.
>>
>> Also, since you mark it as “auto”, have you run this test on all
>> CI-relevant hosts?  (Among other things I can’t predict) I wonder how
>> preallocation behaves on macOS.  Just because that one was always a bit
>> weird about not-really-data areas.
>>
> 
> Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..

Well, someone has to do it.  The background story is that tests are
added to auto all the time (because “why not”), and then they fail on
BSD or macOS.  We have BSD docker test build targets at least, so they
can be easily tested.  (Well, it takes like half an hour, but you know.)

(We don’t have macOS builds, as far as I can tell, but I personally
don’t even know why we run the iotests on macOS at all.  (Well, I also
wonder about the BSDs, but given the test build targets, I shouldn’t
complain, I suppose.))

(Though macOS isn’t part of the gating CI, is it?  I seem to remember
macOS errors are generally only reported to me half a week after the
pull request is merged, which is even worse.)

Anyway.  I just ran the test for OpenBSD
(EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
   make vm-build-openbsd)
and got some failures:

--- /home/qemu/qemu-test.PGo2ls/src/tests/qemu-iotests/298.out  Fri Sep
25 07:10:31 2020
+++ /home/qemu/qemu-test.PGo2ls/build/tests/qemu-iotests/298.out.bad
Fri Sep 25 08:57:56 2020
@@ -1,5 +1,67 @@
-.............
+qemu-io: Failed to resize underlying file: Unsupported preallocation
mode: falloc
+qemu-io: Failed to resize underlying file: Unsupported preallocation
mode: falloc
+FFFF.F...F...
+======================================================================
+FAIL: test_external_snapshot (__main__.TestPreallocateFilter)
 ----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "298", line 81, in test_external_snapshot
+    self.test_prealloc()
+  File "298", line 78, in test_prealloc
+    self.check_big()
+  File "298", line 48, in check_big
+    self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+AssertionError: False is not true
+
+======================================================================
+FAIL: test_prealloc (__main__.TestPreallocateFilter)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "298", line 78, in test_prealloc
+    self.check_big()
+  File "298", line 48, in check_big
+    self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+AssertionError: False is not true
+
+======================================================================
+FAIL: test_reopen_opts (__main__.TestPreallocateFilter)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "298", line 119, in test_reopen_opts
+    self.assertTrue(os.path.getsize(disk) == 25 * MiB)
+AssertionError: False is not true
+
+======================================================================
+FAIL: test_qemu_img (__main__.TestQemuImg)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "298", line 61, in test_qemu_img
+    self.check_big()
+  File "298", line 48, in check_big
+    self.assertTrue(os.path.getsize(disk) > 100 * MiB)
+AssertionError: False is not true
+
+======================================================================
+FAIL: test_truncate_inside_preallocated_area__falloc
(__main__.TestTruncate)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "298", line 161, in test_truncate_inside_preallocated_area__falloc
+    self.do_test('falloc', '50M')
+  File "298", line 135, in do_test
+    self.assertEqual(ret, 0)
+AssertionError: 1 != 0
+
+======================================================================
+FAIL: test_truncate_over_preallocated_area__falloc (__main__.TestTruncate)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "298", line 173, in test_truncate_over_preallocated_area__falloc
+    self.do_test('falloc', '150M')
+  File "298", line 135, in do_test
+    self.assertEqual(ret, 0)
+AssertionError: 1 != 0
+
+----------------------------------------------------------------------
 Ran 13 tests

-OK
+FAILED (failures=6)

> If I don't put new test in "auto", is there any chance that test would
> be automatically run somewhere?

I run all tests before pull requests at least.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 13/15] scripts/simplebench: improve view of ascii table
  2020-09-18 18:19 ` [PATCH v6 13/15] scripts/simplebench: improve view of ascii table Vladimir Sementsov-Ogievskiy
@ 2020-09-25  9:31   ` Max Reitz
  2020-09-25 16:58     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2020-09-25  9:31 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 1875 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Introduce dynamic float precision and use percentage to show delta.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  scripts/simplebench/simplebench.py | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
> index 716d7fe9b2..56d3a91ea2 100644
> --- a/scripts/simplebench/simplebench.py
> +++ b/scripts/simplebench/simplebench.py
> @@ -79,10 +79,34 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
>      return result
>  
>  
> +def format_float(x):
> +    res = round(x)
> +    if res >= 100:
> +        return str(res)
> +
> +    res = f'{x:.1f}'
> +    if len(res) >= 4:
> +        return res
> +
> +    return f'{x:.2f}'

This itches me to ask for some log() calculation.

Like:

%.*f' % (math.ceil(math.log10(99.95 / x)), x)

(For three significant digits)

> +def format_percent(x):
> +    x *= 100
> +
> +    res = round(x)
> +    if res >= 10:
> +        return str(res)
> +
> +    return f'{x:.1f}' if res >= 1 else f'{x:.2f}'

Same here.  (Also, why not append a % sign in this function?)

>  def ascii_one(result):
>      """Return ASCII representation of bench_one() returned dict."""
>      if 'average' in result:
> -        s = '{:.2f} +- {:.2f}'.format(result['average'], result['delta'])
> +        avg = result['average']
> +        delta_pr = result['delta'] / avg
> +        s = f'{format_float(avg)}±{format_percent(delta_pr)}%'

Pre-existing, but isn’t the ± range generally assumed to be the standard
deviation?

Max

>          if 'n-failed' in result:
>              s += '\n({} failed)'.format(result['n-failed'])
>          return s
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line
  2020-09-18 18:19 ` [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line Vladimir Sementsov-Ogievskiy
@ 2020-09-25 10:24   ` Max Reitz
  2020-09-25 17:13     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 37+ messages in thread
From: Max Reitz @ 2020-09-25 10:24 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 2175 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Performance improvements / degradations are usually discussed in
> percentage. Let's make the script calculate it for us.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---
>  1 file changed, 42 insertions(+), 4 deletions(-)
> 
> diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
> index 56d3a91ea2..0ff05a38b8 100644
> --- a/scripts/simplebench/simplebench.py
> +++ b/scripts/simplebench/simplebench.py

[...]

> +            for j in range(0, i):
> +                env_j = results['envs'][j]
> +                res_j = case_results[env_j['id']]
> +
> +                if 'average' not in res_j:
> +                    # Failed result
> +                    cell += ' --'
> +                    continue
> +
> +                col_j = chr(ord('A') + j)
> +                avg_j = res_j['average']
> +                delta = (res['average'] - avg_j) / avg_j * 100

I was wondering why you’d subtract, when percentage differences usually
mean a quotient.  Then I realized that this would usually be written as:

(res['average'] / avg_j - 1) * 100

> +                delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100

Why not use the new format_percent for both cases?

> +                cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'

I don’t know what I should think about ±delta_delta.  If I saw “Compared
to run A, this is +42.1%±2.0%”, I would think that you calculated the
difference between each run result, and then based on that array
calculated average and standard deviation.

Furthermore, I don’t even know what the delta_delta is supposed to tell
you.  It isn’t even a delta_delta, it’s an average_delta.

The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.
And that might be presented perhaps like “+42.1% Δ± +2.0%” (if delta
were the SD, “Δx̅=+42.1% Δσ=+2.0%” would also work; although, again, I do
interpret ± as the SD anyway).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py
  2020-09-18 18:19 ` [PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py Vladimir Sementsov-Ogievskiy
@ 2020-09-25 11:25   ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2020-09-25 11:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 697 bytes --]

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> Benchmark for new preallocate filter.
> 
> Example usage:
>     ./bench_prealloc.py ../../build/qemu-img \
>         ssd-ext4:/path/to/mount/point \
>         ssd-xfs:/path2 hdd-ext4:/path3 hdd-xfs:/path4
> 
> The benchmark shows performance improvement (or degradation) when use
> new preallocate filter with qcow2 image.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  scripts/simplebench/bench_prealloc.py | 128 ++++++++++++++++++++++++++
>  1 file changed, 128 insertions(+)
>  create mode 100755 scripts/simplebench/bench_prealloc.py

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-25  9:11       ` Max Reitz
@ 2020-09-25 15:11         ` Vladimir Sementsov-Ogievskiy
  2020-09-25 15:32           ` Vladimir Sementsov-Ogievskiy
  2020-10-01  8:41           ` Thomas Huth
  0 siblings, 2 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-25 15:11 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, armbru, eblake, fam, stefanha, kwolf, den

25.09.2020 12:11, Max Reitz wrote:
> On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:
>> 25.09.2020 11:26, Max Reitz wrote:
>>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    tests/qemu-iotests/298     | 186 +++++++++++++++++++++++++++++++++++++
>>>>    tests/qemu-iotests/298.out |   5 +
>>>>    tests/qemu-iotests/group   |   1 +
>>>>    3 files changed, 192 insertions(+)
>>>>    create mode 100644 tests/qemu-iotests/298
>>>>    create mode 100644 tests/qemu-iotests/298.out
> 
> [...]
> 
>>>> +class TestTruncate(iotests.QMPTestCase):
>>>
>>> The same decorator could be placed here, although this class doesn’t
>>> start a VM, and so is unaffected by the allowlist.  Still may be
>>> relevant in case of block modules, I don’t know.
>>
>> Or just global test skip at file top
> 
> Hm.  Like verify_quorum()?  Is there a generic function for that already?
> 
> [...]
> 
>>>> +        # Probably we'll want preallocate filter to keep align to
>>>> cluster when
>>>> +        # shrink preallocation, so, ignore small differece
>>>> +        self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
>>>> +
>>>> +        # Preallocate filter may leak some internal clusters (for
>>>> example, if
>>>> +        # guest write far over EOF, skipping some clusters - they
>>>> will remain
>>>> +        # fallocated, preallocate filter don't care about such
>>>> leaks, it drops
>>>> +        # only trailing preallocation.
>>>
>>> True, but that isn’t what’s happening here.  (We only write 10M at 0, so
>>> there are no gaps.)  Why do we need this 1M margin?
>>
>> We write 10M, but qcow2 also writes metadata as it wants
> 
> Ah, yes, sure.  Shouldn’t result in 1M, but why not.
> 
>>>> +        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
>>>> +                        1024 * 1024)
>>>
>>> [...]
>>>
>>>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>>>> index ff59cfd2d4..15d5f9619b 100644
>>>> --- a/tests/qemu-iotests/group
>>>> +++ b/tests/qemu-iotests/group
>>>> @@ -307,6 +307,7 @@
>>>>    295 rw
>>>>    296 rw
>>>>    297 meta
>>>> +298 auto quick
>>>
>>> I wouldn’t mark it as quick, there is at least one preallocate=full of
>>> 140M, and one of 40M, plus multiple 10M data writes and falloc
>>> preallocations.
>>>
>>> Also, since you mark it as “auto”, have you run this test on all
>>> CI-relevant hosts?  (Among other things I can’t predict) I wonder how
>>> preallocation behaves on macOS.  Just because that one was always a bit
>>> weird about not-really-data areas.
>>>
>>
>> Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..
> 
> Well, someone has to do it.  The background story is that tests are
> added to auto all the time (because “why not”), and then they fail on
> BSD or macOS.  We have BSD docker test build targets at least, so they
> can be easily tested.  (Well, it takes like half an hour, but you know.)
> 
> (We don’t have macOS builds, as far as I can tell, but I personally
> don’t even know why we run the iotests on macOS at all.  (Well, I also
> wonder about the BSDs, but given the test build targets, I shouldn’t
> complain, I suppose.))
> 
> (Though macOS isn’t part of the gating CI, is it?  I seem to remember
> macOS errors are generally only reported to me half a week after the
> pull request is merged, which is even worse.)
> 
> Anyway.  I just ran the test for OpenBSD
> (EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
>     make vm-build-openbsd)

Oh, I didn't know that it's so simple. What another things you are running before sending a PULL?

> and got some failures:
> 
> --- /home/qemu/qemu-test.PGo2ls/src/tests/qemu-iotests/298.out  Fri Sep
> 25 07:10:31 2020
> +++ /home/qemu/qemu-test.PGo2ls/build/tests/qemu-iotests/298.out.bad
> Fri Sep 25 08:57:56 2020
> @@ -1,5 +1,67 @@
> -.............
> +qemu-io: Failed to resize underlying file: Unsupported preallocation
> mode: falloc
> +qemu-io: Failed to resize underlying file: Unsupported preallocation
> mode: falloc
> +FFFF.F...F...
> +======================================================================
> +FAIL: test_external_snapshot (__main__.TestPreallocateFilter)
>   ----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "298", line 81, in test_external_snapshot
> +    self.test_prealloc()
> +  File "298", line 78, in test_prealloc
> +    self.check_big()
> +  File "298", line 48, in check_big
> +    self.assertTrue(os.path.getsize(disk) > 100 * MiB)
> +AssertionError: False is not true
> +
> +======================================================================
> +FAIL: test_prealloc (__main__.TestPreallocateFilter)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "298", line 78, in test_prealloc
> +    self.check_big()
> +  File "298", line 48, in check_big
> +    self.assertTrue(os.path.getsize(disk) > 100 * MiB)
> +AssertionError: False is not true
> +
> +======================================================================
> +FAIL: test_reopen_opts (__main__.TestPreallocateFilter)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "298", line 119, in test_reopen_opts
> +    self.assertTrue(os.path.getsize(disk) == 25 * MiB)
> +AssertionError: False is not true
> +
> +======================================================================
> +FAIL: test_qemu_img (__main__.TestQemuImg)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "298", line 61, in test_qemu_img
> +    self.check_big()
> +  File "298", line 48, in check_big
> +    self.assertTrue(os.path.getsize(disk) > 100 * MiB)
> +AssertionError: False is not true
> +
> +======================================================================
> +FAIL: test_truncate_inside_preallocated_area__falloc
> (__main__.TestTruncate)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "298", line 161, in test_truncate_inside_preallocated_area__falloc
> +    self.do_test('falloc', '50M')
> +  File "298", line 135, in do_test
> +    self.assertEqual(ret, 0)
> +AssertionError: 1 != 0
> +
> +======================================================================
> +FAIL: test_truncate_over_preallocated_area__falloc (__main__.TestTruncate)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "298", line 173, in test_truncate_over_preallocated_area__falloc
> +    self.do_test('falloc', '150M')
> +  File "298", line 135, in do_test
> +    self.assertEqual(ret, 0)
> +AssertionError: 1 != 0
> +
> +----------------------------------------------------------------------
>   Ran 13 tests
> 
> -OK
> +FAILED (failures=6)
> 
>> If I don't put new test in "auto", is there any chance that test would
>> be automatically run somewhere?
> 
> I run all tests before pull requests at least.
> 
> Max
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-25 15:11         ` Vladimir Sementsov-Ogievskiy
@ 2020-09-25 15:32           ` Vladimir Sementsov-Ogievskiy
  2020-10-01  7:40             ` Max Reitz
  2020-10-01  8:41           ` Thomas Huth
  1 sibling, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-25 15:32 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, armbru, eblake, fam, stefanha, kwolf, den

25.09.2020 18:11, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 12:11, Max Reitz wrote:
>> On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:
>>> 25.09.2020 11:26, Max Reitz wrote:
>>>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>    tests/qemu-iotests/298     | 186 +++++++++++++++++++++++++++++++++++++
>>>>>    tests/qemu-iotests/298.out |   5 +
>>>>>    tests/qemu-iotests/group   |   1 +
>>>>>    3 files changed, 192 insertions(+)
>>>>>    create mode 100644 tests/qemu-iotests/298
>>>>>    create mode 100644 tests/qemu-iotests/298.out
>>
>> [...]
>>
>>>>> +class TestTruncate(iotests.QMPTestCase):
>>>>
>>>> The same decorator could be placed here, although this class doesn’t
>>>> start a VM, and so is unaffected by the allowlist.  Still may be
>>>> relevant in case of block modules, I don’t know.
>>>
>>> Or just global test skip at file top
>>
>> Hm.  Like verify_quorum()?  Is there a generic function for that already?
>>
>> [...]
>>
>>>>> +        # Probably we'll want preallocate filter to keep align to
>>>>> cluster when
>>>>> +        # shrink preallocation, so, ignore small differece
>>>>> +        self.assertLess(abs(stat.st_size - refstat.st_size), 64 * 1024)
>>>>> +
>>>>> +        # Preallocate filter may leak some internal clusters (for
>>>>> example, if
>>>>> +        # guest write far over EOF, skipping some clusters - they
>>>>> will remain
>>>>> +        # fallocated, preallocate filter don't care about such
>>>>> leaks, it drops
>>>>> +        # only trailing preallocation.
>>>>
>>>> True, but that isn’t what’s happening here.  (We only write 10M at 0, so
>>>> there are no gaps.)  Why do we need this 1M margin?
>>>
>>> We write 10M, but qcow2 also writes metadata as it wants
>>
>> Ah, yes, sure.  Shouldn’t result in 1M, but why not.
>>
>>>>> +        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) * 512,
>>>>> +                        1024 * 1024)
>>>>
>>>> [...]
>>>>
>>>>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>>>>> index ff59cfd2d4..15d5f9619b 100644
>>>>> --- a/tests/qemu-iotests/group
>>>>> +++ b/tests/qemu-iotests/group
>>>>> @@ -307,6 +307,7 @@
>>>>>    295 rw
>>>>>    296 rw
>>>>>    297 meta
>>>>> +298 auto quick
>>>>
>>>> I wouldn’t mark it as quick, there is at least one preallocate=full of
>>>> 140M, and one of 40M, plus multiple 10M data writes and falloc
>>>> preallocations.
>>>>
>>>> Also, since you mark it as “auto”, have you run this test on all
>>>> CI-relevant hosts?  (Among other things I can’t predict) I wonder how
>>>> preallocation behaves on macOS.  Just because that one was always a bit
>>>> weird about not-really-data areas.
>>>>
>>>
>>> Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..

Sorry, I see now that it sounds rude.

>>
>> Well, someone has to do it.  The background story is that tests are
>> added to auto all the time (because “why not”), and then they fail on
>> BSD or macOS.  We have BSD docker test build targets at least, so they
>> can be easily tested.  (Well, it takes like half an hour, but you know.)
>>
>> (We don’t have macOS builds, as far as I can tell, but I personally
>> don’t even know why we run the iotests on macOS at all.  (Well, I also
>> wonder about the BSDs, but given the test build targets, I shouldn’t
>> complain, I suppose.))
>>
>> (Though macOS isn’t part of the gating CI, is it?  I seem to remember
>> macOS errors are generally only reported to me half a week after the
>> pull request is merged, which is even worse.)
>>
>> Anyway.  I just ran the test for OpenBSD
>> (EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
>>     make vm-build-openbsd)
> 
> Oh, I didn't know that it's so simple. What another things you are running before sending a PULL?
> 
>> and got some failures:
>>
>> --- /home/qemu/qemu-test.PGo2ls/src/tests/qemu-iotests/298.out  Fri Sep
>> 25 07:10:31 2020
>> +++ /home/qemu/qemu-test.PGo2ls/build/tests/qemu-iotests/298.out.bad
>> Fri Sep 25 08:57:56 2020
>> @@ -1,5 +1,67 @@
>> -.............
>> +qemu-io: Failed to resize underlying file: Unsupported preallocation
>> mode: falloc

[..]

>> -OK
>> +FAILED (failures=6)
>>
>>> If I don't put new test in "auto", is there any chance that test would
>>> be automatically run somewhere?
>>
>> I run all tests before pull requests at least.
>>

OK, so it doesn't work on BSD at all. I don't really want to investigate, but seems it's because of absence of fallocate. So let's drop "auto" group.

Another thing: maybe, add auto-linux test group for running only in linux-build? So, new tests will be added to it because why not, and we will not bother with BSD and MacOS?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 13/15] scripts/simplebench: improve view of ascii table
  2020-09-25  9:31   ` Max Reitz
@ 2020-09-25 16:58     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-25 16:58 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, armbru, eblake, fam, stefanha, kwolf, den

25.09.2020 12:31, Max Reitz wrote:
> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>> Introduce dynamic float precision and use percentage to show delta.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   scripts/simplebench/simplebench.py | 26 +++++++++++++++++++++++++-
>>   1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
>> index 716d7fe9b2..56d3a91ea2 100644
>> --- a/scripts/simplebench/simplebench.py
>> +++ b/scripts/simplebench/simplebench.py
>> @@ -79,10 +79,34 @@ def bench_one(test_func, test_env, test_case, count=5, initial_run=True):
>>       return result
>>   
>>   
>> +def format_float(x):
>> +    res = round(x)
>> +    if res >= 100:
>> +        return str(res)
>> +
>> +    res = f'{x:.1f}'
>> +    if len(res) >= 4:
>> +        return res
>> +
>> +    return f'{x:.2f}'
> 
> This itches me to ask for some log() calculation.
> 
> Like:
> 
> %.*f' % (math.ceil(math.log10(99.95 / x)), x)
> 

Oh yes, that's cool.

> 
>> +def format_percent(x):
>> +    x *= 100
>> +
>> +    res = round(x)
>> +    if res >= 10:
>> +        return str(res)
>> +
>> +    return f'{x:.1f}' if res >= 1 else f'{x:.2f}'
> 
> Same here.  (Also, why not append a % sign in this function?)

OK

> 
>>   def ascii_one(result):
>>       """Return ASCII representation of bench_one() returned dict."""
>>       if 'average' in result:
>> -        s = '{:.2f} +- {:.2f}'.format(result['average'], result['delta'])
>> +        avg = result['average']
>> +        delta_pr = result['delta'] / avg
>> +        s = f'{format_float(avg)}±{format_percent(delta_pr)}%'
> 
> Pre-existing, but isn’t the ± range generally assumed to be the standard
> deviation?
> 

Hmm. Actually, why not, let's just use standard deviation. I wanted to show maximum deviation, not mean, to not miss some bugs in experiment (big deviation of one test run). Still, seems standard deviation is good enough in it.

> 
>>           if 'n-failed' in result:
>>               s += '\n({} failed)'.format(result['n-failed'])
>>           return s
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line
  2020-09-25 10:24   ` Max Reitz
@ 2020-09-25 17:13     ` Vladimir Sementsov-Ogievskiy
  2020-10-01  8:22       ` Max Reitz
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-25 17:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, armbru, eblake, fam, stefanha, kwolf, den

25.09.2020 13:24, Max Reitz wrote:
> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>> Performance improvements / degradations are usually discussed in
>> percentage. Let's make the script calculate it for us.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---
>>   1 file changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/scripts/simplebench/simplebench.py b/scripts/simplebench/simplebench.py
>> index 56d3a91ea2..0ff05a38b8 100644
>> --- a/scripts/simplebench/simplebench.py
>> +++ b/scripts/simplebench/simplebench.py
> 
> [...]
> 
>> +            for j in range(0, i):
>> +                env_j = results['envs'][j]
>> +                res_j = case_results[env_j['id']]
>> +
>> +                if 'average' not in res_j:
>> +                    # Failed result
>> +                    cell += ' --'
>> +                    continue
>> +
>> +                col_j = chr(ord('A') + j)
>> +                avg_j = res_j['average']
>> +                delta = (res['average'] - avg_j) / avg_j * 100
> 
> I was wondering why you’d subtract, when percentage differences usually
> mean a quotient.  Then I realized that this would usually be written as:
> 
> (res['average'] / avg_j - 1) * 100
> 
>> +                delta_delta = (res['delta'] + res_j['delta']) / avg_j * 100
> 
> Why not use the new format_percent for both cases?

because I want less precision here

> 
>> +                cell += f' {col_j}{round(delta):+}±{round(delta_delta)}%'
> 
> I don’t know what I should think about ±delta_delta.  If I saw “Compared
> to run A, this is +42.1%±2.0%”, I would think that you calculated the
> difference between each run result, and then based on that array
> calculated average and standard deviation.
> 
> Furthermore, I don’t even know what the delta_delta is supposed to tell
> you.  It isn’t even a delta_delta, it’s an average_delta.

not avarage, but sum of errors. And it shows the error for the delta

> 
> The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.

and this shows nothing.

Assume we have = A = 10+-2 and B = 15+-2

The difference is (15-10)+-(2+2) = 5+-4.
And your formula will give (2/2 - 1) *100 = 0, which is wrong.

Anyway, my code is mess)

> And that might be presented perhaps like “+42.1% Δ± +2.0%” (if delta
> were the SD, “Δx̅=+42.1% Δσ=+2.0%” would also work; although, again, I do
> interpret ± as the SD anyway).
> 

I feel that I'm bad in statistics :( I'll learn a little and make a new version.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING
  2020-09-18 18:19 ` [PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING Vladimir Sementsov-Ogievskiy
@ 2020-09-28 15:34   ` Alberto Garcia
  0 siblings, 0 replies; 37+ messages in thread
From: Alberto Garcia @ 2020-09-28 15:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, vsementsov, qemu-devel, armbru, stefanha, den, mreitz

On Fri 18 Sep 2020 08:19:37 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> 1. BDRV_REQ_NO_SERIALISING doesn't exist already, don't mention it.
>
> 2. We are going to add one more user of BDRV_REQ_SERIALISING, so
>    comment about backup becomes a bit confusing here. The use case in
>    backup is documented in block/backup.c, so let's just drop
>    duplication here.
>
> 3. The fact that BDRV_REQ_SERIALISING is only for write requests is
>    omitted. Add a note.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Reviewed-by: Alberto Garcia <berto@igalia.com>

Berto


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-25 15:32           ` Vladimir Sementsov-Ogievskiy
@ 2020-10-01  7:40             ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2020-10-01  7:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 6553 bytes --]

On 25.09.20 17:32, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 18:11, Vladimir Sementsov-Ogievskiy wrote:
>> 25.09.2020 12:11, Max Reitz wrote:
>>> On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:
>>>> 25.09.2020 11:26, Max Reitz wrote:
>>>>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy
>>>>>> <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>    tests/qemu-iotests/298     | 186
>>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>>    tests/qemu-iotests/298.out |   5 +
>>>>>>    tests/qemu-iotests/group   |   1 +
>>>>>>    3 files changed, 192 insertions(+)
>>>>>>    create mode 100644 tests/qemu-iotests/298
>>>>>>    create mode 100644 tests/qemu-iotests/298.out
>>>
>>> [...]
>>>
>>>>>> +class TestTruncate(iotests.QMPTestCase):
>>>>>
>>>>> The same decorator could be placed here, although this class doesn’t
>>>>> start a VM, and so is unaffected by the allowlist.  Still may be
>>>>> relevant in case of block modules, I don’t know.
>>>>
>>>> Or just global test skip at file top
>>>
>>> Hm.  Like verify_quorum()?  Is there a generic function for that
>>> already?
>>>
>>> [...]
>>>
>>>>>> +        # Probably we'll want preallocate filter to keep align to
>>>>>> cluster when
>>>>>> +        # shrink preallocation, so, ignore small differece
>>>>>> +        self.assertLess(abs(stat.st_size - refstat.st_size), 64 *
>>>>>> 1024)
>>>>>> +
>>>>>> +        # Preallocate filter may leak some internal clusters (for
>>>>>> example, if
>>>>>> +        # guest write far over EOF, skipping some clusters - they
>>>>>> will remain
>>>>>> +        # fallocated, preallocate filter don't care about such
>>>>>> leaks, it drops
>>>>>> +        # only trailing preallocation.
>>>>>
>>>>> True, but that isn’t what’s happening here.  (We only write 10M at
>>>>> 0, so
>>>>> there are no gaps.)  Why do we need this 1M margin?
>>>>
>>>> We write 10M, but qcow2 also writes metadata as it wants
>>>
>>> Ah, yes, sure.  Shouldn’t result in 1M, but why not.
>>>
>>>>>> +        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) *
>>>>>> 512,
>>>>>> +                        1024 * 1024)
>>>>>
>>>>> [...]
>>>>>
>>>>>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>>>>>> index ff59cfd2d4..15d5f9619b 100644
>>>>>> --- a/tests/qemu-iotests/group
>>>>>> +++ b/tests/qemu-iotests/group
>>>>>> @@ -307,6 +307,7 @@
>>>>>>    295 rw
>>>>>>    296 rw
>>>>>>    297 meta
>>>>>> +298 auto quick
>>>>>
>>>>> I wouldn’t mark it as quick, there is at least one preallocate=full of
>>>>> 140M, and one of 40M, plus multiple 10M data writes and falloc
>>>>> preallocations.
>>>>>
>>>>> Also, since you mark it as “auto”, have you run this test on all
>>>>> CI-relevant hosts?  (Among other things I can’t predict) I wonder how
>>>>> preallocation behaves on macOS.  Just because that one was always a
>>>>> bit
>>>>> weird about not-really-data areas.
>>>>>
>>>>
>>>> Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..
> 
> Sorry, I see now that it sounds rude.

I found it completely understandable, because I share the same
sentiment.  It’s the reason I’m so hesitant adding tests to auto.

>>> Well, someone has to do it.  The background story is that tests are
>>> added to auto all the time (because “why not”), and then they fail on
>>> BSD or macOS.  We have BSD docker test build targets at least, so they
>>> can be easily tested.  (Well, it takes like half an hour, but you know.)
>>>
>>> (We don’t have macOS builds, as far as I can tell, but I personally
>>> don’t even know why we run the iotests on macOS at all.  (Well, I also
>>> wonder about the BSDs, but given the test build targets, I shouldn’t
>>> complain, I suppose.))
>>>
>>> (Though macOS isn’t part of the gating CI, is it?  I seem to remember
>>> macOS errors are generally only reported to me half a week after the
>>> pull request is merged, which is even worse.)
>>>
>>> Anyway.  I just ran the test for OpenBSD
>>> (EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
>>>     make vm-build-openbsd)
>>
>> Oh, I didn't know that it's so simple.

It kind of is simple, but it still takes so long that I don’t consider
it simple.

>> What another things you are
>> running before sending a PULL?

Actually, I’m not running any of the vm-build-* targets.  (If I did, I
wouldn’t have to ask you whether you did, because I’d just run them
anyway and then tell you about any potential failures.)

I compile everything on Fedora and Arch (x86-64), -m32 and -m64, clang
and gcc, and one time with mingw.  I run make check on everything but
mingw, and then run the following iotests on Fedora gcc-m64 and Arch
clang-m32 (on tmpfs):

-qcow2 -x auto, -qcow2 -o compat=0.10, -qcow2 -o refcount_bits=1,
-qcow2 -o data_file='$TEST_IMG.ext_data_file', -nbd, -raw, -luks, -vmdk,
-vhdx, -qcow, -vdi, -vpc, -qed, -cloop, -parallels, -bochs

And on non-tmpfs: -c none -qcow2 142 199

(At some point that meant that all iotests that don’t require root are
at least run once.  I should check whether that’s still the case, I
suppose...)

>>> and got some failures:
>>>
>>> --- /home/qemu/qemu-test.PGo2ls/src/tests/qemu-iotests/298.out  Fri Sep
>>> 25 07:10:31 2020
>>> +++ /home/qemu/qemu-test.PGo2ls/build/tests/qemu-iotests/298.out.bad
>>> Fri Sep 25 08:57:56 2020
>>> @@ -1,5 +1,67 @@
>>> -.............
>>> +qemu-io: Failed to resize underlying file: Unsupported preallocation
>>> mode: falloc
> 
> [..]
> 
>>> -OK
>>> +FAILED (failures=6)
>>>
>>>> If I don't put new test in "auto", is there any chance that test would
>>>> be automatically run somewhere?
>>>
>>> I run all tests before pull requests at least.
>>>
> 
> OK, so it doesn't work on BSD at all. I don't really want to
> investigate, but seems it's because of absence of fallocate. So let's
> drop "auto" group.

I’d be OK with that.

> Another thing: maybe, add auto-linux test group for running only in
> linux-build? So, new tests will be added to it because why not, and we
> will not bother with BSD and MacOS?

We have _supported_os in bash tests and supported_platforms in Python
tests, so if you want a test in auto but run it only on Linux, you can
specify that there.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line
  2020-09-25 17:13     ` Vladimir Sementsov-Ogievskiy
@ 2020-10-01  8:22       ` Max Reitz
  0 siblings, 0 replies; 37+ messages in thread
From: Max Reitz @ 2020-10-01  8:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den


[-- Attachment #1.1: Type: text/plain, Size: 4223 bytes --]

On 25.09.20 19:13, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 13:24, Max Reitz wrote:
>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>> Performance improvements / degradations are usually discussed in
>>> percentage. Let's make the script calculate it for us.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---
>>>   1 file changed, 42 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/scripts/simplebench/simplebench.py
>>> b/scripts/simplebench/simplebench.py
>>> index 56d3a91ea2..0ff05a38b8 100644
>>> --- a/scripts/simplebench/simplebench.py
>>> +++ b/scripts/simplebench/simplebench.py
>>
>> [...]
>>
>>> +            for j in range(0, i):
>>> +                env_j = results['envs'][j]
>>> +                res_j = case_results[env_j['id']]
>>> +
>>> +                if 'average' not in res_j:
>>> +                    # Failed result
>>> +                    cell += ' --'
>>> +                    continue
>>> +
>>> +                col_j = chr(ord('A') + j)
>>> +                avg_j = res_j['average']
>>> +                delta = (res['average'] - avg_j) / avg_j * 100
>>
>> I was wondering why you’d subtract, when percentage differences usually
>> mean a quotient.  Then I realized that this would usually be written as:
>>
>> (res['average'] / avg_j - 1) * 100
>>
>>> +                delta_delta = (res['delta'] + res_j['delta']) /
>>> avg_j * 100
>>
>> Why not use the new format_percent for both cases?
> 
> because I want less precision here
> 
>>
>>> +                cell += f'
>>> {col_j}{round(delta):+}±{round(delta_delta)}%'
>>
>> I don’t know what I should think about ±delta_delta.  If I saw “Compared
>> to run A, this is +42.1%±2.0%”, I would think that you calculated the
>> difference between each run result, and then based on that array
>> calculated average and standard deviation.
>>
>> Furthermore, I don’t even know what the delta_delta is supposed to tell
>> you.  It isn’t even a delta_delta, it’s an average_delta.
> 
> not avarage, but sum of errors. And it shows the error for the delta
> 
>>
>> The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.
> 
> and this shows nothing.
> 
> Assume we have = A = 10+-2 and B = 15+-2
> 
> The difference is (15-10)+-(2+2) = 5+-4.
> And your formula will give (2/2 - 1) *100 = 0, which is wrong.

Well, it’s the difference in delta (whatever “delta” means here).  I
wouldn’t call it wrong.

We want to compare two test runs, so if both have the same delta, then
the difference in delta is 0.  That’s how understood it, hence my “Δ±”
notation below.  (This may be useful information, because perhaps one
may consider a big delta bad, and so if one run has less delta than
another one, that may be considered a better outcoming.  Comparing
deltas has a purpose.)

I see I understood your intentions wrong, though; you want to just give
an error estimate for the difference of the means of both runs.  I have
to admit I don’t know how that works exactly, and it will probably
heavily depend on what “delta” is.

(Googling suggests that for the standard deviation, one would square
each SD to get the variance back, then divide by the respective sample
size, add, and take the square root.  But that’s for when you have two
distributions that you want to combine, but we want to compare here...
http://homework.uoregon.edu/pub/class/es202/ztest.html seems to suggest
the same for such a comparison, though.  I don’t know.)

(As for your current version, after more thinking it does seem right
when delta is the maximum deviation.  Or perhaps the deltas shouldn’t be
added then but the maximum should be used?  I’m just not sure.)

((Perhaps it doesn’t even matter.  “Don’t believe any statistics you
haven’t forged yourself”, and so on.))

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver
  2020-09-25 15:11         ` Vladimir Sementsov-Ogievskiy
  2020-09-25 15:32           ` Vladimir Sementsov-Ogievskiy
@ 2020-10-01  8:41           ` Thomas Huth
  1 sibling, 0 replies; 37+ messages in thread
From: Thomas Huth @ 2020-10-01  8:41 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: fam, kwolf, qemu-devel, armbru, stefanha, den

On 25/09/2020 17.11, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 12:11, Max Reitz wrote:
>> On 25.09.20 10:49, Vladimir Sementsov-Ogievskiy wrote:
>>> 25.09.2020 11:26, Max Reitz wrote:
>>>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>    tests/qemu-iotests/298     | 186
>>>>> +++++++++++++++++++++++++++++++++++++
>>>>>    tests/qemu-iotests/298.out |   5 +
>>>>>    tests/qemu-iotests/group   |   1 +
>>>>>    3 files changed, 192 insertions(+)
>>>>>    create mode 100644 tests/qemu-iotests/298
>>>>>    create mode 100644 tests/qemu-iotests/298.out
>>
>> [...]
>>
>>>>> +class TestTruncate(iotests.QMPTestCase):
>>>>
>>>> The same decorator could be placed here, although this class doesn’t
>>>> start a VM, and so is unaffected by the allowlist.  Still may be
>>>> relevant in case of block modules, I don’t know.
>>>
>>> Or just global test skip at file top
>>
>> Hm.  Like verify_quorum()?  Is there a generic function for that already?
>>
>> [...]
>>
>>>>> +        # Probably we'll want preallocate filter to keep align to
>>>>> cluster when
>>>>> +        # shrink preallocation, so, ignore small differece
>>>>> +        self.assertLess(abs(stat.st_size - refstat.st_size), 64 *
>>>>> 1024)
>>>>> +
>>>>> +        # Preallocate filter may leak some internal clusters (for
>>>>> example, if
>>>>> +        # guest write far over EOF, skipping some clusters - they
>>>>> will remain
>>>>> +        # fallocated, preallocate filter don't care about such
>>>>> leaks, it drops
>>>>> +        # only trailing preallocation.
>>>>
>>>> True, but that isn’t what’s happening here.  (We only write 10M at
>>>> 0, so
>>>> there are no gaps.)  Why do we need this 1M margin?
>>>
>>> We write 10M, but qcow2 also writes metadata as it wants
>>
>> Ah, yes, sure.  Shouldn’t result in 1M, but why not.
>>
>>>>> +        self.assertLess(abs(stat.st_blocks - refstat.st_blocks) *
>>>>> 512,
>>>>> +                        1024 * 1024)
>>>>
>>>> [...]
>>>>
>>>>> diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
>>>>> index ff59cfd2d4..15d5f9619b 100644
>>>>> --- a/tests/qemu-iotests/group
>>>>> +++ b/tests/qemu-iotests/group
>>>>> @@ -307,6 +307,7 @@
>>>>>    295 rw
>>>>>    296 rw
>>>>>    297 meta
>>>>> +298 auto quick
>>>>
>>>> I wouldn’t mark it as quick, there is at least one preallocate=full of
>>>> 140M, and one of 40M, plus multiple 10M data writes and falloc
>>>> preallocations.
>>>>
>>>> Also, since you mark it as “auto”, have you run this test on all
>>>> CI-relevant hosts?  (Among other things I can’t predict) I wonder how
>>>> preallocation behaves on macOS.  Just because that one was always a bit
>>>> weird about not-really-data areas.
>>>>
>>>
>>> Ofcourse, I didn't run on all hosts. I'm a bit out of sync about this..
>>
>> Well, someone has to do it.  The background story is that tests are
>> added to auto all the time (because “why not”), and then they fail on
>> BSD or macOS.  We have BSD docker test build targets at least, so they
>> can be easily tested.  (Well, it takes like half an hour, but you know.)
>>
>> (We don’t have macOS builds, as far as I can tell, but I personally
>> don’t even know why we run the iotests on macOS at all.  (Well, I also
>> wonder about the BSDs, but given the test build targets, I shouldn’t
>> complain, I suppose.))
>>
>> (Though macOS isn’t part of the gating CI, is it?  I seem to remember
>> macOS errors are generally only reported to me half a week after the
>> pull request is merged, which is even worse.)
>>
>> Anyway.  I just ran the test for OpenBSD
>> (EXTRA_CONFIGURE_OPTS='--target-list=x86_64-softmmu' \
>>     make vm-build-openbsd)
> 
> Oh, I didn't know that it's so simple.
Running the tests on macOS is also quite simple if you have a github
account. You simply add the "Cirrus-CI" from the marketplace to your
forked qemu repository there, and then push your work to a branch in
that repo. Cirrus-CI then automatically tests your stuff on macOS (and
also FreeBSD), e.g.:

 https://cirrus-ci.com/build/4961684689256448

 Thomas



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-10-01  8:42 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18 18:19 [PATCH v6 00/15] preallocate filter Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 01/15] block: simplify comment to BDRV_REQ_SERIALISING Vladimir Sementsov-Ogievskiy
2020-09-28 15:34   ` Alberto Garcia
2020-09-18 18:19 ` [PATCH v6 02/15] block/io.c: drop assertion on double waiting for request serialisation Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 03/15] block/io: split out bdrv_find_conflicting_request Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 04/15] block/io: bdrv_wait_serialising_requests_locked: drop extra bs arg Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 05/15] block: bdrv_mark_request_serialising: split non-waiting function Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 06/15] block: introduce BDRV_REQ_NO_WAIT flag Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 07/15] block: bdrv_check_perm(): process children anyway Vladimir Sementsov-Ogievskiy
2020-09-24 14:25   ` Max Reitz
2020-09-24 14:55     ` Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 08/15] block: introduce preallocate filter Vladimir Sementsov-Ogievskiy
2020-09-24 16:50   ` Max Reitz
2020-09-24 17:36     ` Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 09/15] qemu-io: add preallocate mode parameter for truncate command Vladimir Sementsov-Ogievskiy
2020-09-24 17:08   ` Max Reitz
2020-09-18 18:19 ` [PATCH v6 10/15] iotests: qemu_io_silent: support --image-opts Vladimir Sementsov-Ogievskiy
2020-09-25  7:10   ` Max Reitz
2020-09-18 18:19 ` [PATCH v6 11/15] iotests: add 298 to test new preallocate filter driver Vladimir Sementsov-Ogievskiy
2020-09-25  8:26   ` Max Reitz
2020-09-25  8:49     ` Vladimir Sementsov-Ogievskiy
2020-09-25  9:11       ` Max Reitz
2020-09-25 15:11         ` Vladimir Sementsov-Ogievskiy
2020-09-25 15:32           ` Vladimir Sementsov-Ogievskiy
2020-10-01  7:40             ` Max Reitz
2020-10-01  8:41           ` Thomas Huth
2020-09-18 18:19 ` [PATCH v6 12/15] scripts/simplebench: support iops Vladimir Sementsov-Ogievskiy
2020-09-25  8:54   ` Max Reitz
2020-09-18 18:19 ` [PATCH v6 13/15] scripts/simplebench: improve view of ascii table Vladimir Sementsov-Ogievskiy
2020-09-25  9:31   ` Max Reitz
2020-09-25 16:58     ` Vladimir Sementsov-Ogievskiy
2020-09-18 18:19 ` [PATCH v6 14/15] scripts/simplebench: improve ascii table: add difference line Vladimir Sementsov-Ogievskiy
2020-09-25 10:24   ` Max Reitz
2020-09-25 17:13     ` Vladimir Sementsov-Ogievskiy
2020-10-01  8:22       ` Max Reitz
2020-09-18 18:19 ` [PATCH v6 15/15] scripts/simplebench: add bench_prealloc.py Vladimir Sementsov-Ogievskiy
2020-09-25 11:25   ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.