All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status
@ 2019-04-08 16:26 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-08 16:26 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: mreitz, kwolf, fam, stefanha, eblake, den, vsementsov

Hi!

It's a continuation for
"[PATCH] qcow2: avoid lseek on block_status if possible"
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html

performance results for block-status on tmpfs [tests originally by Kevin,
now they are in 01]:

./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x

after 01:

plain: 81.77
forward: 82.61
prealloc: 0.01

after 02:

plain: 0.12
forward: 0.12
prealloc: 0.01

v2:
 01: new
 02: [mostly by Kevin's comments]
     - rewritten to go through new flag BDRV_BLOCK_RECURSE
     - never retry detection if failed for first time
     - rewrite detection to do less iterations and to be more simple

     iotests 102 behavior changed [and not sure about other two ones
     in comparison with v1, but it seems it doesn't matter]

     also, patch subject changed, as now it's a generic change for
     block layer


Vladimir Sementsov-Ogievskiy (2):
  tests/perf: Test lseek influence on qcow2 block-status
  block: avoid recursive block_status call if possible

 block/qcow2.h                              |  4 ++
 include/block/block.h                      |  8 ++-
 block/io.c                                 |  9 ++-
 block/qcow2-refcount.c                     | 32 ++++++++++
 block/qcow2.c                              | 11 ++++
 tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
 tests/qemu-iotests/102                     |  2 +-
 tests/qemu-iotests/102.out                 |  3 +-
 tests/qemu-iotests/141.out                 |  2 +-
 tests/qemu-iotests/144.out                 |  2 +-
 10 files changed, 138 insertions(+), 6 deletions(-)
 create mode 100755 tests/perf/block/qcow2/convert-blockstatus

-- 
2.18.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status
@ 2019-04-08 16:26 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-08 16:26 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, fam, vsementsov, mreitz, stefanha, den

Hi!

It's a continuation for
"[PATCH] qcow2: avoid lseek on block_status if possible"
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html

performance results for block-status on tmpfs [tests originally by Kevin,
now they are in 01]:

./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x

after 01:

plain: 81.77
forward: 82.61
prealloc: 0.01

after 02:

plain: 0.12
forward: 0.12
prealloc: 0.01

v2:
 01: new
 02: [mostly by Kevin's comments]
     - rewritten to go through new flag BDRV_BLOCK_RECURSE
     - never retry detection if failed for first time
     - rewrite detection to do less iterations and to be more simple

     iotests 102 behavior changed [and not sure about other two ones
     in comparison with v1, but it seems it doesn't matter]

     also, patch subject changed, as now it's a generic change for
     block layer


Vladimir Sementsov-Ogievskiy (2):
  tests/perf: Test lseek influence on qcow2 block-status
  block: avoid recursive block_status call if possible

 block/qcow2.h                              |  4 ++
 include/block/block.h                      |  8 ++-
 block/io.c                                 |  9 ++-
 block/qcow2-refcount.c                     | 32 ++++++++++
 block/qcow2.c                              | 11 ++++
 tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
 tests/qemu-iotests/102                     |  2 +-
 tests/qemu-iotests/102.out                 |  3 +-
 tests/qemu-iotests/141.out                 |  2 +-
 tests/qemu-iotests/144.out                 |  2 +-
 10 files changed, 138 insertions(+), 6 deletions(-)
 create mode 100755 tests/perf/block/qcow2/convert-blockstatus

-- 
2.18.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] tests/perf: Test lseek influence on qcow2 block-status
@ 2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-08 16:26 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: mreitz, kwolf, fam, stefanha, eblake, den, vsementsov

Block layer may recursively check block_status in file child of qcow2,
if qcow2 driver returned DATA. There are several test cases to check
influence of lseek on block_status performance. To see real difference
run on tmpfs.

Tests originally created by Kevin, I just refactored and put them
together into one executable file with simple output.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100755 tests/perf/block/qcow2/convert-blockstatus

diff --git a/tests/perf/block/qcow2/convert-blockstatus b/tests/perf/block/qcow2/convert-blockstatus
new file mode 100755
index 0000000000..a1a3c1ef43
--- /dev/null
+++ b/tests/perf/block/qcow2/convert-blockstatus
@@ -0,0 +1,71 @@
+#!/bin/bash
+#
+# Test lseek influence on qcow2 block-status
+#
+# Block layer may recursively check block_status in file child of qcow2, if
+# qcow2 driver returned DATA. There are several test cases to check influence
+# of lseek on block_status performance. To see real difference run on tmpfs.
+#
+# Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
+#
+# Tests originally written by Kevin Wolf
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+if [ "$#" -lt 1 ]; then
+    echo "Usage: $0 SOURCE_FILE"
+    exit 1
+fi
+
+ROOT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../../../.." >/dev/null 2>&1 && pwd )"
+QEMU_IMG="$ROOT_DIR/qemu-img"
+QEMU_IO="$ROOT_DIR/qemu-io"
+
+size=1G
+src="$1"
+
+# test-case plain
+
+(
+$QEMU_IMG create -f qcow2 "$src" $size
+for i in $(seq 16384 -1 0); do
+    echo "write $((i * 65536)) 64k"
+done | $QEMU_IO "$src"
+) > /dev/null
+
+echo -n "plain: "
+/usr/bin/time -f %e $QEMU_IMG convert -n "$src" null-co://
+
+# test-case forward
+
+(
+$QEMU_IMG create -f qcow2 "$src" $size
+for i in $(seq 0 2 16384); do
+    echo "write $((i * 65536)) 64k"
+done | $QEMU_IO "$src"
+for i in $(seq 1 2 16384); do
+    echo "write $((i * 65536)) 64k"
+done | $QEMU_IO "$src"
+) > /dev/null
+
+echo -n "forward: "
+/usr/bin/time -f %e $QEMU_IMG convert -n "$src" null-co://
+
+# test-case prealloc
+
+$QEMU_IMG create -f qcow2 -o preallocation=metadata "$src" $size > /dev/null
+
+echo -n "prealloc: "
+/usr/bin/time -f %e $QEMU_IMG convert -n "$src" null-co://
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] tests/perf: Test lseek influence on qcow2 block-status
@ 2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-08 16:26 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, fam, vsementsov, mreitz, stefanha, den

Block layer may recursively check block_status in file child of qcow2,
if qcow2 driver returned DATA. There are several test cases to check
influence of lseek on block_status performance. To see real difference
run on tmpfs.

Tests originally created by Kevin, I just refactored and put them
together into one executable file with simple output.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100755 tests/perf/block/qcow2/convert-blockstatus

diff --git a/tests/perf/block/qcow2/convert-blockstatus b/tests/perf/block/qcow2/convert-blockstatus
new file mode 100755
index 0000000000..a1a3c1ef43
--- /dev/null
+++ b/tests/perf/block/qcow2/convert-blockstatus
@@ -0,0 +1,71 @@
+#!/bin/bash
+#
+# Test lseek influence on qcow2 block-status
+#
+# Block layer may recursively check block_status in file child of qcow2, if
+# qcow2 driver returned DATA. There are several test cases to check influence
+# of lseek on block_status performance. To see real difference run on tmpfs.
+#
+# Copyright (c) 2019 Virtuozzo International GmbH. All rights reserved.
+#
+# Tests originally written by Kevin Wolf
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+if [ "$#" -lt 1 ]; then
+    echo "Usage: $0 SOURCE_FILE"
+    exit 1
+fi
+
+ROOT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../../../.." >/dev/null 2>&1 && pwd )"
+QEMU_IMG="$ROOT_DIR/qemu-img"
+QEMU_IO="$ROOT_DIR/qemu-io"
+
+size=1G
+src="$1"
+
+# test-case plain
+
+(
+$QEMU_IMG create -f qcow2 "$src" $size
+for i in $(seq 16384 -1 0); do
+    echo "write $((i * 65536)) 64k"
+done | $QEMU_IO "$src"
+) > /dev/null
+
+echo -n "plain: "
+/usr/bin/time -f %e $QEMU_IMG convert -n "$src" null-co://
+
+# test-case forward
+
+(
+$QEMU_IMG create -f qcow2 "$src" $size
+for i in $(seq 0 2 16384); do
+    echo "write $((i * 65536)) 64k"
+done | $QEMU_IO "$src"
+for i in $(seq 1 2 16384); do
+    echo "write $((i * 65536)) 64k"
+done | $QEMU_IO "$src"
+) > /dev/null
+
+echo -n "forward: "
+/usr/bin/time -f %e $QEMU_IMG convert -n "$src" null-co://
+
+# test-case prealloc
+
+$QEMU_IMG create -f qcow2 -o preallocation=metadata "$src" $size > /dev/null
+
+echo -n "prealloc: "
+/usr/bin/time -f %e $QEMU_IMG convert -n "$src" null-co://
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible
@ 2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-08 16:26 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: mreitz, kwolf, fam, stefanha, eblake, den, vsementsov

drv_co_block_status digs bs->file for additional, more accurate search
for hole inside region, reported as DATA by bs since 5daa74a6ebc.

This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
knows, where are holes and where is data. But every block_status
request calls lseek additionally. Assume a big disk, full of
data, in any iterative copying block job (or img convert) we'll call
lseek(HOLE) on every iteration, and each of these lseeks will have to
iterate through all metadata up to the end of file. It's obviously
ineffective behavior. And for many scenarios we don't need this lseek
at all.

However, lseek is needed when we have metadata-preallocated image.

So, let's detect metadata-preallocation case and don't dig qcow2's
protocol file in other cases.

The idea is to compare allocation size in POV of filesystem with
allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
significantly lower, consider it as metadata-preallocation case.

102 iotest changed, as our detector can't detect shrinked file as
metadata-preallocation, which don't seem to be wrong, as with metadata
preallocation we always have valid file length.

Other two iotests tiny changed QMP output sequence, which should be
exactly because skipped lseek at mirror beginning.

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h              |  4 ++++
 include/block/block.h      |  8 +++++++-
 block/io.c                 |  9 ++++++++-
 block/qcow2-refcount.c     | 32 ++++++++++++++++++++++++++++++++
 block/qcow2.c              | 11 +++++++++++
 tests/qemu-iotests/102     |  2 +-
 tests/qemu-iotests/102.out |  3 ++-
 tests/qemu-iotests/141.out |  2 +-
 tests/qemu-iotests/144.out |  2 +-
 9 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index fdee297f33..b6135d8271 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -350,6 +350,9 @@ typedef struct BDRVQcow2State {
     int nb_compress_threads;
 
     BdrvChild *data_file;
+
+    bool metadata_preallocation_checked;
+    bool metadata_preallocation;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
@@ -643,6 +646,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
                                 void *cb_opaque, Error **errp);
 int qcow2_shrink_reftable(BlockDriverState *bs);
 int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
+int qcow2_detect_metadata_preallocation(BlockDriverState *bs);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
diff --git a/include/block/block.h b/include/block/block.h
index c7a26199aa..1f2f08e4ee 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -156,10 +156,15 @@ typedef struct HDGeometry {
  * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this
  *                 layer, set by block layer
  *
- * Internal flag:
+ * Internal flags:
  * BDRV_BLOCK_RAW: for use by passthrough drivers, such as raw, to request
  *                 that the block layer recompute the answer from the returned
  *                 BDS; must be accompanied by just BDRV_BLOCK_OFFSET_VALID.
+ * BDRV_BLOCK_RECURSE: request that the block layer will recursively search for
+ *                     zeroes in file child of current block node inside
+ *                     returned region. Only valid together with both
+ *                     BDRV_BLOCK_DATA and BDRV_BLOCK_OFFSET_VALID. Should not
+ *                     appear with BDRV_BLOCK_ZERO.
  *
  * If BDRV_BLOCK_OFFSET_VALID is set, the map parameter represents the
  * host offset within the returned BDS that is allocated for the
@@ -184,6 +189,7 @@ typedef struct HDGeometry {
 #define BDRV_BLOCK_RAW          0x08
 #define BDRV_BLOCK_ALLOCATED    0x10
 #define BDRV_BLOCK_EOF          0x20
+#define BDRV_BLOCK_RECURSE      0x40
 #define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
 
 typedef QSIMPLEQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) BlockReopenQueue;
diff --git a/block/io.c b/block/io.c
index dfc153b8d8..8595d4b504 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2121,6 +2121,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
      */
     assert(*pnum && QEMU_IS_ALIGNED(*pnum, align) &&
            align > offset - aligned_offset);
+    if (ret & BDRV_BLOCK_RECURSE) {
+        assert(ret & BDRV_BLOCK_DATA);
+        assert(ret & BDRV_BLOCK_OFFSET_VALID);
+        assert(!(ret & BDRV_BLOCK_ZERO));
+    }
+
     *pnum -= offset - aligned_offset;
     if (*pnum > bytes) {
         *pnum = bytes;
@@ -2151,7 +2157,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
         }
     }
 
-    if (want_zero && local_file && local_file != bs &&
+    if (want_zero && ret & BDRV_BLOCK_RECURSE &&
+        local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
         int64_t file_pnum;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e0fe322500..5786ba93e0 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3422,3 +3422,35 @@ int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
                             "There are no references in the refcount table.");
     return -EIO;
 }
+
+int qcow2_detect_metadata_preallocation(BlockDriverState *bs)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int64_t i, end_cluster, cluster_count = 0, threshold;
+    int64_t file_length, real_allocation, real_clusters;
+
+    file_length = bdrv_getlength(bs->file->bs);
+    if (file_length < 0) {
+        return file_length;
+    }
+
+    real_allocation = bdrv_get_allocated_file_size(bs->file->bs);
+    if (real_allocation < 0) {
+        return real_allocation;
+    }
+
+    real_clusters = real_allocation / s->cluster_size;
+    threshold = MAX(real_clusters * 10 / 9, real_clusters + 2);
+
+    end_cluster = size_to_clusters(s, file_length);
+    for (i = 0; i < end_cluster && cluster_count < threshold; i++) {
+        uint64_t refcount;
+        int ret = qcow2_get_refcount(bs, i, &refcount);
+        if (ret < 0) {
+            return ret;
+        }
+        cluster_count += !!refcount;
+    }
+
+    return cluster_count >= threshold;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index d507ee0686..e89e1c4fb1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1900,6 +1900,12 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     unsigned int bytes;
     int status = 0;
 
+    if (!s->metadata_preallocation_checked) {
+        ret = qcow2_detect_metadata_preallocation(bs);
+        s->metadata_preallocation = (ret == 1);
+        s->metadata_preallocation_checked = true;
+    }
+
     bytes = MIN(INT_MAX, count);
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
@@ -1922,6 +1928,11 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
         status |= BDRV_BLOCK_DATA;
     }
+    if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
+        (status & BDRV_BLOCK_OFFSET_VALID))
+    {
+        status |= BDRV_BLOCK_RECURSE;
+    }
     return status;
 }
 
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index cedd2b25dc..125402db2b 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -56,7 +56,7 @@ $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG resize -f raw --shrink "$TEST_IMG" $((5 * 64 * 1024))
 
 $QEMU_IO -c map "$TEST_IMG"
-$QEMU_IMG map "$TEST_IMG"
+$QEMU_IMG map "$TEST_IMG" | _filter_qemu_img_map
 
 echo
 echo '=== Testing map on an image file truncated outside of qemu ==='
diff --git a/tests/qemu-iotests/102.out b/tests/qemu-iotests/102.out
index 4401b08fee..cd2fdc7f96 100644
--- a/tests/qemu-iotests/102.out
+++ b/tests/qemu-iotests/102.out
@@ -7,7 +7,8 @@ wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Image resized.
 64 KiB (0x10000) bytes     allocated at offset 0 bytes (0x0)
-Offset          Length          Mapped to       File
+Offset          Length          File
+0               0x10000         TEST_DIR/t.IMGFMT
 
 === Testing map on an image file truncated outside of qemu ===
 
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index 41c7291258..4d71d9dcae 100644
--- a/tests/qemu-iotests/141.out
+++ b/tests/qemu-iotests/141.out
@@ -42,9 +42,9 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
+{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
-{"return": {}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
diff --git a/tests/qemu-iotests/144.out b/tests/qemu-iotests/144.out
index 55299201e4..a9a8216bea 100644
--- a/tests/qemu-iotests/144.out
+++ b/tests/qemu-iotests/144.out
@@ -14,10 +14,10 @@ Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/
 
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "virtio0"}}
+{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "virtio0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "virtio0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible
@ 2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-08 16:26 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, fam, vsementsov, mreitz, stefanha, den

drv_co_block_status digs bs->file for additional, more accurate search
for hole inside region, reported as DATA by bs since 5daa74a6ebc.

This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
knows, where are holes and where is data. But every block_status
request calls lseek additionally. Assume a big disk, full of
data, in any iterative copying block job (or img convert) we'll call
lseek(HOLE) on every iteration, and each of these lseeks will have to
iterate through all metadata up to the end of file. It's obviously
ineffective behavior. And for many scenarios we don't need this lseek
at all.

However, lseek is needed when we have metadata-preallocated image.

So, let's detect metadata-preallocation case and don't dig qcow2's
protocol file in other cases.

The idea is to compare allocation size in POV of filesystem with
allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
significantly lower, consider it as metadata-preallocation case.

102 iotest changed, as our detector can't detect shrinked file as
metadata-preallocation, which don't seem to be wrong, as with metadata
preallocation we always have valid file length.

Other two iotests tiny changed QMP output sequence, which should be
exactly because skipped lseek at mirror beginning.

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2.h              |  4 ++++
 include/block/block.h      |  8 +++++++-
 block/io.c                 |  9 ++++++++-
 block/qcow2-refcount.c     | 32 ++++++++++++++++++++++++++++++++
 block/qcow2.c              | 11 +++++++++++
 tests/qemu-iotests/102     |  2 +-
 tests/qemu-iotests/102.out |  3 ++-
 tests/qemu-iotests/141.out |  2 +-
 tests/qemu-iotests/144.out |  2 +-
 9 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index fdee297f33..b6135d8271 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -350,6 +350,9 @@ typedef struct BDRVQcow2State {
     int nb_compress_threads;
 
     BdrvChild *data_file;
+
+    bool metadata_preallocation_checked;
+    bool metadata_preallocation;
 } BDRVQcow2State;
 
 typedef struct Qcow2COWRegion {
@@ -643,6 +646,7 @@ int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order,
                                 void *cb_opaque, Error **errp);
 int qcow2_shrink_reftable(BlockDriverState *bs);
 int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size);
+int qcow2_detect_metadata_preallocation(BlockDriverState *bs);
 
 /* qcow2-cluster.c functions */
 int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
diff --git a/include/block/block.h b/include/block/block.h
index c7a26199aa..1f2f08e4ee 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -156,10 +156,15 @@ typedef struct HDGeometry {
  * BDRV_BLOCK_EOF: the returned pnum covers through end of file for this
  *                 layer, set by block layer
  *
- * Internal flag:
+ * Internal flags:
  * BDRV_BLOCK_RAW: for use by passthrough drivers, such as raw, to request
  *                 that the block layer recompute the answer from the returned
  *                 BDS; must be accompanied by just BDRV_BLOCK_OFFSET_VALID.
+ * BDRV_BLOCK_RECURSE: request that the block layer will recursively search for
+ *                     zeroes in file child of current block node inside
+ *                     returned region. Only valid together with both
+ *                     BDRV_BLOCK_DATA and BDRV_BLOCK_OFFSET_VALID. Should not
+ *                     appear with BDRV_BLOCK_ZERO.
  *
  * If BDRV_BLOCK_OFFSET_VALID is set, the map parameter represents the
  * host offset within the returned BDS that is allocated for the
@@ -184,6 +189,7 @@ typedef struct HDGeometry {
 #define BDRV_BLOCK_RAW          0x08
 #define BDRV_BLOCK_ALLOCATED    0x10
 #define BDRV_BLOCK_EOF          0x20
+#define BDRV_BLOCK_RECURSE      0x40
 #define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
 
 typedef QSIMPLEQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) BlockReopenQueue;
diff --git a/block/io.c b/block/io.c
index dfc153b8d8..8595d4b504 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2121,6 +2121,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
      */
     assert(*pnum && QEMU_IS_ALIGNED(*pnum, align) &&
            align > offset - aligned_offset);
+    if (ret & BDRV_BLOCK_RECURSE) {
+        assert(ret & BDRV_BLOCK_DATA);
+        assert(ret & BDRV_BLOCK_OFFSET_VALID);
+        assert(!(ret & BDRV_BLOCK_ZERO));
+    }
+
     *pnum -= offset - aligned_offset;
     if (*pnum > bytes) {
         *pnum = bytes;
@@ -2151,7 +2157,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
         }
     }
 
-    if (want_zero && local_file && local_file != bs &&
+    if (want_zero && ret & BDRV_BLOCK_RECURSE &&
+        local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
         int64_t file_pnum;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e0fe322500..5786ba93e0 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -3422,3 +3422,35 @@ int64_t qcow2_get_last_cluster(BlockDriverState *bs, int64_t size)
                             "There are no references in the refcount table.");
     return -EIO;
 }
+
+int qcow2_detect_metadata_preallocation(BlockDriverState *bs)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int64_t i, end_cluster, cluster_count = 0, threshold;
+    int64_t file_length, real_allocation, real_clusters;
+
+    file_length = bdrv_getlength(bs->file->bs);
+    if (file_length < 0) {
+        return file_length;
+    }
+
+    real_allocation = bdrv_get_allocated_file_size(bs->file->bs);
+    if (real_allocation < 0) {
+        return real_allocation;
+    }
+
+    real_clusters = real_allocation / s->cluster_size;
+    threshold = MAX(real_clusters * 10 / 9, real_clusters + 2);
+
+    end_cluster = size_to_clusters(s, file_length);
+    for (i = 0; i < end_cluster && cluster_count < threshold; i++) {
+        uint64_t refcount;
+        int ret = qcow2_get_refcount(bs, i, &refcount);
+        if (ret < 0) {
+            return ret;
+        }
+        cluster_count += !!refcount;
+    }
+
+    return cluster_count >= threshold;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index d507ee0686..e89e1c4fb1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1900,6 +1900,12 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     unsigned int bytes;
     int status = 0;
 
+    if (!s->metadata_preallocation_checked) {
+        ret = qcow2_detect_metadata_preallocation(bs);
+        s->metadata_preallocation = (ret == 1);
+        s->metadata_preallocation_checked = true;
+    }
+
     bytes = MIN(INT_MAX, count);
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_get_cluster_offset(bs, offset, &bytes, &cluster_offset);
@@ -1922,6 +1928,11 @@ static int coroutine_fn qcow2_co_block_status(BlockDriverState *bs,
     } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
         status |= BDRV_BLOCK_DATA;
     }
+    if (s->metadata_preallocation && (status & BDRV_BLOCK_DATA) &&
+        (status & BDRV_BLOCK_OFFSET_VALID))
+    {
+        status |= BDRV_BLOCK_RECURSE;
+    }
     return status;
 }
 
diff --git a/tests/qemu-iotests/102 b/tests/qemu-iotests/102
index cedd2b25dc..125402db2b 100755
--- a/tests/qemu-iotests/102
+++ b/tests/qemu-iotests/102
@@ -56,7 +56,7 @@ $QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG resize -f raw --shrink "$TEST_IMG" $((5 * 64 * 1024))
 
 $QEMU_IO -c map "$TEST_IMG"
-$QEMU_IMG map "$TEST_IMG"
+$QEMU_IMG map "$TEST_IMG" | _filter_qemu_img_map
 
 echo
 echo '=== Testing map on an image file truncated outside of qemu ==='
diff --git a/tests/qemu-iotests/102.out b/tests/qemu-iotests/102.out
index 4401b08fee..cd2fdc7f96 100644
--- a/tests/qemu-iotests/102.out
+++ b/tests/qemu-iotests/102.out
@@ -7,7 +7,8 @@ wrote 65536/65536 bytes at offset 0
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Image resized.
 64 KiB (0x10000) bytes     allocated at offset 0 bytes (0x0)
-Offset          Length          Mapped to       File
+Offset          Length          File
+0               0x10000         TEST_DIR/t.IMGFMT
 
 === Testing map on an image file truncated outside of qemu ===
 
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index 41c7291258..4d71d9dcae 100644
--- a/tests/qemu-iotests/141.out
+++ b/tests/qemu-iotests/141.out
@@ -42,9 +42,9 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
+{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
-{"return": {}}
 {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
diff --git a/tests/qemu-iotests/144.out b/tests/qemu-iotests/144.out
index 55299201e4..a9a8216bea 100644
--- a/tests/qemu-iotests/144.out
+++ b/tests/qemu-iotests/144.out
@@ -14,10 +14,10 @@ Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 backing_file=TEST_DIR/
 
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "virtio0"}}
+{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "virtio0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
 {"return": {}}
-{"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "virtio0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "virtio0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status
@ 2019-04-22  9:58   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-22  9:58 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: mreitz, kwolf, fam, stefanha, eblake, Denis Lunev

ping

08.04.2019 19:26, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> It's a continuation for
> "[PATCH] qcow2: avoid lseek on block_status if possible"
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html
> 
> performance results for block-status on tmpfs [tests originally by Kevin,
> now they are in 01]:
> 
> ./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x
> 
> after 01:
> 
> plain: 81.77
> forward: 82.61
> prealloc: 0.01
> 
> after 02:
> 
> plain: 0.12
> forward: 0.12
> prealloc: 0.01
> 
> v2:
>   01: new
>   02: [mostly by Kevin's comments]
>       - rewritten to go through new flag BDRV_BLOCK_RECURSE
>       - never retry detection if failed for first time
>       - rewrite detection to do less iterations and to be more simple
> 
>       iotests 102 behavior changed [and not sure about other two ones
>       in comparison with v1, but it seems it doesn't matter]
> 
>       also, patch subject changed, as now it's a generic change for
>       block layer
> 
> 
> Vladimir Sementsov-Ogievskiy (2):
>    tests/perf: Test lseek influence on qcow2 block-status
>    block: avoid recursive block_status call if possible
> 
>   block/qcow2.h                              |  4 ++
>   include/block/block.h                      |  8 ++-
>   block/io.c                                 |  9 ++-
>   block/qcow2-refcount.c                     | 32 ++++++++++
>   block/qcow2.c                              | 11 ++++
>   tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
>   tests/qemu-iotests/102                     |  2 +-
>   tests/qemu-iotests/102.out                 |  3 +-
>   tests/qemu-iotests/141.out                 |  2 +-
>   tests/qemu-iotests/144.out                 |  2 +-
>   10 files changed, 138 insertions(+), 6 deletions(-)
>   create mode 100755 tests/perf/block/qcow2/convert-blockstatus
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status
@ 2019-04-22  9:58   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-22  9:58 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, fam, Denis Lunev, mreitz, stefanha

ping

08.04.2019 19:26, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> It's a continuation for
> "[PATCH] qcow2: avoid lseek on block_status if possible"
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html
> 
> performance results for block-status on tmpfs [tests originally by Kevin,
> now they are in 01]:
> 
> ./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x
> 
> after 01:
> 
> plain: 81.77
> forward: 82.61
> prealloc: 0.01
> 
> after 02:
> 
> plain: 0.12
> forward: 0.12
> prealloc: 0.01
> 
> v2:
>   01: new
>   02: [mostly by Kevin's comments]
>       - rewritten to go through new flag BDRV_BLOCK_RECURSE
>       - never retry detection if failed for first time
>       - rewrite detection to do less iterations and to be more simple
> 
>       iotests 102 behavior changed [and not sure about other two ones
>       in comparison with v1, but it seems it doesn't matter]
> 
>       also, patch subject changed, as now it's a generic change for
>       block layer
> 
> 
> Vladimir Sementsov-Ogievskiy (2):
>    tests/perf: Test lseek influence on qcow2 block-status
>    block: avoid recursive block_status call if possible
> 
>   block/qcow2.h                              |  4 ++
>   include/block/block.h                      |  8 ++-
>   block/io.c                                 |  9 ++-
>   block/qcow2-refcount.c                     | 32 ++++++++++
>   block/qcow2.c                              | 11 ++++
>   tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
>   tests/qemu-iotests/102                     |  2 +-
>   tests/qemu-iotests/102.out                 |  3 +-
>   tests/qemu-iotests/141.out                 |  2 +-
>   tests/qemu-iotests/144.out                 |  2 +-
>   10 files changed, 138 insertions(+), 6 deletions(-)
>   create mode 100755 tests/perf/block/qcow2/convert-blockstatus
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status
  2019-04-08 16:26 ` Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  (?)
@ 2019-05-06 15:14 ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-06 15:14 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, fam, Denis Lunev, mreitz, stefanha

ping2

Hi!

My approach of adding perf test disliked by Max in other thread, so
01 may be just skipped.

What about 02?

08.04.2019 19:26, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> It's a continuation for
> "[PATCH] qcow2: avoid lseek on block_status if possible"
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html
> 
> performance results for block-status on tmpfs [tests originally by Kevin,
> now they are in 01]:
> 
> ./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x
> 
> after 01:
> 
> plain: 81.77
> forward: 82.61
> prealloc: 0.01
> 
> after 02:
> 
> plain: 0.12
> forward: 0.12
> prealloc: 0.01
> 
> v2:
>   01: new
>   02: [mostly by Kevin's comments]
>       - rewritten to go through new flag BDRV_BLOCK_RECURSE
>       - never retry detection if failed for first time
>       - rewrite detection to do less iterations and to be more simple
> 
>       iotests 102 behavior changed [and not sure about other two ones
>       in comparison with v1, but it seems it doesn't matter]
> 
>       also, patch subject changed, as now it's a generic change for
>       block layer
> 
> 
> Vladimir Sementsov-Ogievskiy (2):
>    tests/perf: Test lseek influence on qcow2 block-status
>    block: avoid recursive block_status call if possible
> 
>   block/qcow2.h                              |  4 ++
>   include/block/block.h                      |  8 ++-
>   block/io.c                                 |  9 ++-
>   block/qcow2-refcount.c                     | 32 ++++++++++
>   block/qcow2.c                              | 11 ++++
>   tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
>   tests/qemu-iotests/102                     |  2 +-
>   tests/qemu-iotests/102.out                 |  3 +-
>   tests/qemu-iotests/141.out                 |  2 +-
>   tests/qemu-iotests/144.out                 |  2 +-
>   10 files changed, 138 insertions(+), 6 deletions(-)
>   create mode 100755 tests/perf/block/qcow2/convert-blockstatus
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] ping Re: [PATCH v2 for 4.1 0/2] avoid lseek on block_status
  2019-04-08 16:26 ` Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  (?)
@ 2019-05-21  7:50 ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-21  7:50 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: kwolf, fam, Denis Lunev, mreitz, stefanha

ping

08.04.2019 19:26, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> It's a continuation for
> "[PATCH] qcow2: avoid lseek on block_status if possible"
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html
> 
> performance results for block-status on tmpfs [tests originally by Kevin,
> now they are in 01]:
> 
> ./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x
> 
> after 01:
> 
> plain: 81.77
> forward: 82.61
> prealloc: 0.01
> 
> after 02:
> 
> plain: 0.12
> forward: 0.12
> prealloc: 0.01
> 
> v2:
>   01: new
>   02: [mostly by Kevin's comments]
>       - rewritten to go through new flag BDRV_BLOCK_RECURSE
>       - never retry detection if failed for first time
>       - rewrite detection to do less iterations and to be more simple
> 
>       iotests 102 behavior changed [and not sure about other two ones
>       in comparison with v1, but it seems it doesn't matter]
> 
>       also, patch subject changed, as now it's a generic change for
>       block layer
> 
> 
> Vladimir Sementsov-Ogievskiy (2):
>    tests/perf: Test lseek influence on qcow2 block-status
>    block: avoid recursive block_status call if possible
> 
>   block/qcow2.h                              |  4 ++
>   include/block/block.h                      |  8 ++-
>   block/io.c                                 |  9 ++-
>   block/qcow2-refcount.c                     | 32 ++++++++++
>   block/qcow2.c                              | 11 ++++
>   tests/perf/block/qcow2/convert-blockstatus | 71 ++++++++++++++++++++++
>   tests/qemu-iotests/102                     |  2 +-
>   tests/qemu-iotests/102.out                 |  3 +-
>   tests/qemu-iotests/141.out                 |  2 +-
>   tests/qemu-iotests/144.out                 |  2 +-
>   10 files changed, 138 insertions(+), 6 deletions(-)
>   create mode 100755 tests/perf/block/qcow2/convert-blockstatus
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible
  2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
  (?)
@ 2019-05-22 10:32   ` Kevin Wolf
  -1 siblings, 0 replies; 16+ messages in thread
From: Kevin Wolf @ 2019-05-22 10:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: fam, qemu-block, qemu-devel, mreitz, stefanha, den

Am 08.04.2019 um 18:26 hat Vladimir Sementsov-Ogievskiy geschrieben:
> drv_co_block_status digs bs->file for additional, more accurate search
> for hole inside region, reported as DATA by bs since 5daa74a6ebc.
> 
> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
> knows, where are holes and where is data. But every block_status
> request calls lseek additionally. Assume a big disk, full of
> data, in any iterative copying block job (or img convert) we'll call
> lseek(HOLE) on every iteration, and each of these lseeks will have to
> iterate through all metadata up to the end of file. It's obviously
> ineffective behavior. And for many scenarios we don't need this lseek
> at all.
> 
> However, lseek is needed when we have metadata-preallocated image.
> 
> So, let's detect metadata-preallocation case and don't dig qcow2's
> protocol file in other cases.
> 
> The idea is to compare allocation size in POV of filesystem with
> allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
> significantly lower, consider it as metadata-preallocation case.
> 
> 102 iotest changed, as our detector can't detect shrinked file as
> metadata-preallocation, which don't seem to be wrong, as with metadata
> preallocation we always have valid file length.
> 
> Other two iotests tiny changed QMP output sequence, which should be
> exactly because skipped lseek at mirror beginning.

No, these hunks don't show an improvement. An earlier {'return':{}}
means that the block job coroutine has yielded earlier, so it's doing a
blocking operation where it didn't do one before the patch.

What happens is that qcow2_detect_metadata_preallocation() causes
additional I/O by reading in the refcount block.

I'll modify the commit message accordingly while applying.

Kevin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status
  2019-04-08 16:26 ` Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  (?)
@ 2019-05-22 10:47 ` Kevin Wolf
  2019-05-22 11:30   ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2019-05-22 10:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: fam, qemu-block, qemu-devel, mreitz, stefanha, den

Am 08.04.2019 um 18:26 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi!
> 
> It's a continuation for
> "[PATCH] qcow2: avoid lseek on block_status if possible"
> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html
> 
> performance results for block-status on tmpfs [tests originally by Kevin,
> now they are in 01]:
> 
> ./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x
> 
> after 01:
> 
> plain: 81.77
> forward: 82.61
> prealloc: 0.01
> 
> after 02:
> 
> plain: 0.12
> forward: 0.12
> prealloc: 0.01

Thanks, applied to the block branch.

Kevin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status
  2019-05-22 10:47 ` [Qemu-devel] " Kevin Wolf
@ 2019-05-22 11:30   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-22 11:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: fam, Denis Lunev, qemu-block, qemu-devel, mreitz, stefanha

22.05.2019 13:47, Kevin Wolf wrote:
> Am 08.04.2019 um 18:26 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi!
>>
>> It's a continuation for
>> "[PATCH] qcow2: avoid lseek on block_status if possible"
>> https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06598.html
>>
>> performance results for block-status on tmpfs [tests originally by Kevin,
>> now they are in 01]:
>>
>> ./tests/perf/block/qcow2/convert-blockstatus /ramdisk/x
>>
>> after 01:
>>
>> plain: 81.77
>> forward: 82.61
>> prealloc: 0.01
>>
>> after 02:
>>
>> plain: 0.12
>> forward: 0.12
>> prealloc: 0.01
> 
> Thanks, applied to the block branch.
> 
> Kevin
> 

Thank you!

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible
  2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
  (?)
  (?)
@ 2019-05-27 15:13   ` Max Reitz
  2019-05-28  6:39     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 1 reply; 16+ messages in thread
From: Max Reitz @ 2019-05-27 15:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, fam, stefanha, den

[-- Attachment #1: Type: text/plain, Size: 4055 bytes --]

On 08.04.19 18:26, Vladimir Sementsov-Ogievskiy wrote:
> drv_co_block_status digs bs->file for additional, more accurate search
> for hole inside region, reported as DATA by bs since 5daa74a6ebc.
> 
> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
> knows, where are holes and where is data. But every block_status
> request calls lseek additionally. Assume a big disk, full of
> data, in any iterative copying block job (or img convert) we'll call
> lseek(HOLE) on every iteration, and each of these lseeks will have to
> iterate through all metadata up to the end of file. It's obviously
> ineffective behavior. And for many scenarios we don't need this lseek
> at all.
> 
> However, lseek is needed when we have metadata-preallocated image.
> 
> So, let's detect metadata-preallocation case and don't dig qcow2's
> protocol file in other cases.
> 
> The idea is to compare allocation size in POV of filesystem with
> allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
> significantly lower, consider it as metadata-preallocation case.
> 
> 102 iotest changed, as our detector can't detect shrinked file as
> metadata-preallocation, which don't seem to be wrong, as with metadata
> preallocation we always have valid file length.
> 
> Other two iotests tiny changed QMP output sequence, which should be
> exactly because skipped lseek at mirror beginning.
> 
> Suggested-by: Denis V. Lunev <den@openvz.org>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/qcow2.h              |  4 ++++
>  include/block/block.h      |  8 +++++++-
>  block/io.c                 |  9 ++++++++-
>  block/qcow2-refcount.c     | 32 ++++++++++++++++++++++++++++++++
>  block/qcow2.c              | 11 +++++++++++
>  tests/qemu-iotests/102     |  2 +-
>  tests/qemu-iotests/102.out |  3 ++-
>  tests/qemu-iotests/141.out |  2 +-
>  tests/qemu-iotests/144.out |  2 +-
>  9 files changed, 67 insertions(+), 6 deletions(-)

For me, this patch breaks iotests 141 (for qed) and 211 (for vdi):

> 141 1s ...        [17:11:53] [17:11:53] - output mismatch (see 141.out.bad)
> --- tests/qemu-iotests/141.out 2019-05-27 17:11:43.327664282 +0200
> +++ build/tests/qemu-iotests/141.out.bad       2019-05-27 17:11:53.949439880 +0200
> @@ -42,9 +42,9 @@
>  {"return": {}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
> +{"return": {}}
>  {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
>  {"return": {}}
>  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}

and

> 211 5s ...        [17:11:54] [17:11:58] - output mismatch (see 211.out.bad)
> --- tests/qemu-iotests/211.out 2019-05-22 19:58:34.870273427 +0200
> +++ build/tests/qemu-iotests/211.out.bad       2019-05-27 17:11:58.259348827 +0200
> @@ -55,8 +55,7 @@
>  virtual size: 32 MiB (33554432 bytes)
>  cluster_size: 1048576
>  
> -[{ "start": 0, "length": 3072, "depth": 0, "zero": false, "data": true, "offset": 1024},
> -{ "start": 3072, "length": 33551360, "depth": 0, "zero": true, "data": true, "offset": 4096}]
> +[{ "start": 0, "length": 33554432, "depth": 0, "zero": false, "data": true, "offset": 1024}]
>  
>  === Invalid BlockdevRef ===

Doesn’t look too bad, but still, broken iotests are broken iotests. :/

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible
  2019-05-27 15:13   ` Max Reitz
@ 2019-05-28  6:39     ` Vladimir Sementsov-Ogievskiy
  2019-05-28 15:47       ` Max Reitz
  0 siblings, 1 reply; 16+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-05-28  6:39 UTC (permalink / raw)
  To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf, fam, Denis Lunev, stefanha

27.05.2019 18:13, Max Reitz wrote:
> On 08.04.19 18:26, Vladimir Sementsov-Ogievskiy wrote:
>> drv_co_block_status digs bs->file for additional, more accurate search
>> for hole inside region, reported as DATA by bs since 5daa74a6ebc.
>>
>> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
>> knows, where are holes and where is data. But every block_status
>> request calls lseek additionally. Assume a big disk, full of
>> data, in any iterative copying block job (or img convert) we'll call
>> lseek(HOLE) on every iteration, and each of these lseeks will have to
>> iterate through all metadata up to the end of file. It's obviously
>> ineffective behavior. And for many scenarios we don't need this lseek
>> at all.
>>
>> However, lseek is needed when we have metadata-preallocated image.
>>
>> So, let's detect metadata-preallocation case and don't dig qcow2's
>> protocol file in other cases.
>>
>> The idea is to compare allocation size in POV of filesystem with
>> allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
>> significantly lower, consider it as metadata-preallocation case.
>>
>> 102 iotest changed, as our detector can't detect shrinked file as
>> metadata-preallocation, which don't seem to be wrong, as with metadata
>> preallocation we always have valid file length.
>>
>> Other two iotests tiny changed QMP output sequence, which should be
>> exactly because skipped lseek at mirror beginning.
>>
>> Suggested-by: Denis V. Lunev <den@openvz.org>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/qcow2.h              |  4 ++++
>>   include/block/block.h      |  8 +++++++-
>>   block/io.c                 |  9 ++++++++-
>>   block/qcow2-refcount.c     | 32 ++++++++++++++++++++++++++++++++
>>   block/qcow2.c              | 11 +++++++++++
>>   tests/qemu-iotests/102     |  2 +-
>>   tests/qemu-iotests/102.out |  3 ++-
>>   tests/qemu-iotests/141.out |  2 +-
>>   tests/qemu-iotests/144.out |  2 +-
>>   9 files changed, 67 insertions(+), 6 deletions(-)
> 
> For me, this patch breaks iotests 141 (for qed) and 211 (for vdi):
> 
>> 141 1s ...        [17:11:53] [17:11:53] - output mismatch (see 141.out.bad)
>> --- tests/qemu-iotests/141.out 2019-05-27 17:11:43.327664282 +0200
>> +++ build/tests/qemu-iotests/141.out.bad       2019-05-27 17:11:53.949439880 +0200
>> @@ -42,9 +42,9 @@
>>   {"return": {}}
>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
>> -{"return": {}}
>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
>> +{"return": {}}
>>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
>>   {"return": {}}
>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
> 
> and
> 
>> 211 5s ...        [17:11:54] [17:11:58] - output mismatch (see 211.out.bad)
>> --- tests/qemu-iotests/211.out 2019-05-22 19:58:34.870273427 +0200
>> +++ build/tests/qemu-iotests/211.out.bad       2019-05-27 17:11:58.259348827 +0200
>> @@ -55,8 +55,7 @@
>>   virtual size: 32 MiB (33554432 bytes)
>>   cluster_size: 1048576
>>   
>> -[{ "start": 0, "length": 3072, "depth": 0, "zero": false, "data": true, "offset": 1024},
>> -{ "start": 3072, "length": 33551360, "depth": 0, "zero": true, "data": true, "offset": 4096}]
>> +[{ "start": 0, "length": 33554432, "depth": 0, "zero": false, "data": true, "offset": 1024}]
>>   
>>   === Invalid BlockdevRef ===
> 
> Doesn’t look too bad, but still, broken iotests are broken iotests. :/
> 


You are right, thanks for pointing to this! So, here is my investigation:

about 211: aha, looks like I just didn't implement metadata preallocation detection for vdi,
and default behavior is changed. I don't know vdi code, but after fast look, it seems that
it may be as simple as (and it fixes the test):


diff --git a/block/vdi.c b/block/vdi.c
index d7ef6628e7..a72333dcf4 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -542,7 +542,9 @@ static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
      *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
          index_in_block;
      *file = bs->file->bs;
-    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+
+    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID |
+            (s->header.image_type == VDI_TYPE_STATIC ? BDRV_BLOCK_RECURSE : 0);
  }



Or, we can rollback vdi behavior to pre-patch state:



diff --git a/block/vdi.c b/block/vdi.c
index d7ef6628e7..074256d845 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -542,7 +542,7 @@ static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
      *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
          index_in_block;
      *file = bs->file->bs;
-    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_RECURSE;
  }

  static int coroutine_fn




about 141: aha, it's a difference between qcow2 and qed, because of changed io sequence for
qcow2, as Kevin described "What happens is that qcow2_detect_metadata_preallocation() causes
additional I/O by reading in the refcount block.". I think correct way to fix it is to filter
{"return": {}} elements, as we can't be sure where they occur among events:




diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
index 2197a82d45..2704695921 100755
--- a/tests/qemu-iotests/141
+++ b/tests/qemu-iotests/141
@@ -58,16 +58,20 @@ test_blockjob()
                }}}" \
          'return'

+    # We may get {"return": {}} result of the following command among events in
+    # the following output or after all events in the next output (if $2 is an
+    # event, not 'return'). So, filter it here and in the following output to
+    # avoid race in test output.
      _send_qemu_cmd $QEMU_HANDLE \
          "$1" \
          "$2" \
-        | _filter_img_create
+        | _filter_img_create | _filter_qmp_empty_return

      # We want this to return an error because the block job is still running
      _send_qemu_cmd $QEMU_HANDLE \
          "{'execute': 'blockdev-del',
            'arguments': {'node-name': 'drv0'}}" \
-        'error' | _filter_generated_node_ids
+        'error' | _filter_generated_node_ids | _filter_qmp_empty_return

      _send_qemu_cmd $QEMU_HANDLE \
          "{'execute': 'block-job-cancel',
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index 4d71d9dcae..dbd3bdef6c 100644
--- a/tests/qemu-iotests/141.out
+++ b/tests/qemu-iotests/141.out
@@ -10,7 +10,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/m.
  Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{"return": {}}
  {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
@@ -27,7 +26,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
-{"return": {}}
  {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: mirror"}}
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
@@ -42,7 +40,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
  {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
@@ -61,7 +58,6 @@ wrote 1048576/1048576 bytes at offset 0
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{"return": {}}
  {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
@@ -77,7 +73,6 @@ wrote 1048576/1048576 bytes at offset 0
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
-{"return": {}}
  {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
  {"return": {}}
  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 35fddc746f..8e9235d6fe 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -219,5 +219,10 @@ _filter_nbd()
          -e 's#\(foo\|PORT/\?\|.sock\): Failed to .*$#\1#'
  }

+_filter_qmp_empty_return()
+{
+    grep -v '{"return": {}}'
+}
+
  # make sure this script returns success
  true




-- 
Best regards,
Vladimir

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible
  2019-05-28  6:39     ` Vladimir Sementsov-Ogievskiy
@ 2019-05-28 15:47       ` Max Reitz
  0 siblings, 0 replies; 16+ messages in thread
From: Max Reitz @ 2019-05-28 15:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel, qemu-block
  Cc: kwolf, fam, Denis Lunev, stefanha

[-- Attachment #1: Type: text/plain, Size: 12027 bytes --]

On 28.05.19 08:39, Vladimir Sementsov-Ogievskiy wrote:
> 27.05.2019 18:13, Max Reitz wrote:
>> On 08.04.19 18:26, Vladimir Sementsov-Ogievskiy wrote:
>>> drv_co_block_status digs bs->file for additional, more accurate search
>>> for hole inside region, reported as DATA by bs since 5daa74a6ebc.
>>>
>>> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
>>> knows, where are holes and where is data. But every block_status
>>> request calls lseek additionally. Assume a big disk, full of
>>> data, in any iterative copying block job (or img convert) we'll call
>>> lseek(HOLE) on every iteration, and each of these lseeks will have to
>>> iterate through all metadata up to the end of file. It's obviously
>>> ineffective behavior. And for many scenarios we don't need this lseek
>>> at all.
>>>
>>> However, lseek is needed when we have metadata-preallocated image.
>>>
>>> So, let's detect metadata-preallocation case and don't dig qcow2's
>>> protocol file in other cases.
>>>
>>> The idea is to compare allocation size in POV of filesystem with
>>> allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
>>> significantly lower, consider it as metadata-preallocation case.
>>>
>>> 102 iotest changed, as our detector can't detect shrinked file as
>>> metadata-preallocation, which don't seem to be wrong, as with metadata
>>> preallocation we always have valid file length.
>>>
>>> Other two iotests tiny changed QMP output sequence, which should be
>>> exactly because skipped lseek at mirror beginning.
>>>
>>> Suggested-by: Denis V. Lunev <den@openvz.org>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/qcow2.h              |  4 ++++
>>>   include/block/block.h      |  8 +++++++-
>>>   block/io.c                 |  9 ++++++++-
>>>   block/qcow2-refcount.c     | 32 ++++++++++++++++++++++++++++++++
>>>   block/qcow2.c              | 11 +++++++++++
>>>   tests/qemu-iotests/102     |  2 +-
>>>   tests/qemu-iotests/102.out |  3 ++-
>>>   tests/qemu-iotests/141.out |  2 +-
>>>   tests/qemu-iotests/144.out |  2 +-
>>>   9 files changed, 67 insertions(+), 6 deletions(-)
>>
>> For me, this patch breaks iotests 141 (for qed) and 211 (for vdi):
>>
>>> 141 1s ...        [17:11:53] [17:11:53] - output mismatch (see 141.out.bad)
>>> --- tests/qemu-iotests/141.out 2019-05-27 17:11:43.327664282 +0200
>>> +++ build/tests/qemu-iotests/141.out.bad       2019-05-27 17:11:53.949439880 +0200
>>> @@ -42,9 +42,9 @@
>>>   {"return": {}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
>>> -{"return": {}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
>>> +{"return": {}}
>>>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
>>>   {"return": {}}
>>>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
>>
>> and
>>
>>> 211 5s ...        [17:11:54] [17:11:58] - output mismatch (see 211.out.bad)
>>> --- tests/qemu-iotests/211.out 2019-05-22 19:58:34.870273427 +0200
>>> +++ build/tests/qemu-iotests/211.out.bad       2019-05-27 17:11:58.259348827 +0200
>>> @@ -55,8 +55,7 @@
>>>   virtual size: 32 MiB (33554432 bytes)
>>>   cluster_size: 1048576
>>>   
>>> -[{ "start": 0, "length": 3072, "depth": 0, "zero": false, "data": true, "offset": 1024},
>>> -{ "start": 3072, "length": 33551360, "depth": 0, "zero": true, "data": true, "offset": 4096}]
>>> +[{ "start": 0, "length": 33554432, "depth": 0, "zero": false, "data": true, "offset": 1024}]
>>>   
>>>   === Invalid BlockdevRef ===
>>
>> Doesn’t look too bad, but still, broken iotests are broken iotests. :/
>>
> 
> 
> You are right, thanks for pointing to this! So, here is my investigation:
> 
> about 211: aha, looks like I just didn't implement metadata preallocation detection for vdi,
> and default behavior is changed. I don't know vdi code, but after fast look, it seems that
> it may be as simple as (and it fixes the test):
> 
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index d7ef6628e7..a72333dcf4 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -542,7 +542,9 @@ static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
>       *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
>           index_in_block;
>       *file = bs->file->bs;
> -    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +
> +    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID |
> +            (s->header.image_type == VDI_TYPE_STATIC ? BDRV_BLOCK_RECURSE : 0);
>   }

That would be the same as what this patch implements for qcow2, yes.

> Or, we can rollback vdi behavior to pre-patch state:
> 
> 
> 
> diff --git a/block/vdi.c b/block/vdi.c
> index d7ef6628e7..074256d845 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -542,7 +542,7 @@ static int coroutine_fn vdi_co_block_status(BlockDriverState *bs,
>       *map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
>           index_in_block;
>       *file = bs->file->bs;
> -    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_RECURSE;
>   }
> 
>   static int coroutine_fn
> 
> 
> 
> 
> about 141: aha, it's a difference between qcow2 and qed, because of changed io sequence for
> qcow2, as Kevin described "What happens is that qcow2_detect_metadata_preallocation() causes
> additional I/O by reading in the refcount block.". I think correct way to fix it is to filter
> {"return": {}} elements, as we can't be sure where they occur among events:
> 
> 
> 
> 
> diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
> index 2197a82d45..2704695921 100755
> --- a/tests/qemu-iotests/141
> +++ b/tests/qemu-iotests/141
> @@ -58,16 +58,20 @@ test_blockjob()
>                 }}}" \
>           'return'
> 
> +    # We may get {"return": {}} result of the following command among events in
> +    # the following output or after all events in the next output (if $2 is an
> +    # event, not 'return'). So, filter it here and in the following output to
> +    # avoid race in test output.
>       _send_qemu_cmd $QEMU_HANDLE \
>           "$1" \
>           "$2" \
> -        | _filter_img_create
> +        | _filter_img_create | _filter_qmp_empty_return
> 
>       # We want this to return an error because the block job is still running
>       _send_qemu_cmd $QEMU_HANDLE \
>           "{'execute': 'blockdev-del',
>             'arguments': {'node-name': 'drv0'}}" \
> -        'error' | _filter_generated_node_ids
> +        'error' | _filter_generated_node_ids | _filter_qmp_empty_return
> 
>       _send_qemu_cmd $QEMU_HANDLE \
>           "{'execute': 'block-job-cancel',
> diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
> index 4d71d9dcae..dbd3bdef6c 100644
> --- a/tests/qemu-iotests/141.out
> +++ b/tests/qemu-iotests/141.out
> @@ -10,7 +10,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/m.
>   Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
> @@ -27,7 +26,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "mirror"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: mirror"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
> @@ -42,7 +40,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: commit"}}
> @@ -61,7 +58,6 @@ wrote 1048576/1048576 bytes at offset 0
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
> @@ -77,7 +73,6 @@ wrote 1048576/1048576 bytes at offset 0
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
> diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
> index 35fddc746f..8e9235d6fe 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -219,5 +219,10 @@ _filter_nbd()
>           -e 's#\(foo\|PORT/\?\|.sock\): Failed to .*$#\1#'
>   }
> 
> +_filter_qmp_empty_return()
> +{
> +    grep -v '{"return": {}}'
> +}
> +
>   # make sure this script returns success
>   true

I’m not really a fan of this, but I don’t think there is an alternative
for a bash test.  So that does look like the best solution for me.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-05-28 15:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 16:26 [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status Vladimir Sementsov-Ogievskiy
2019-04-08 16:26 ` Vladimir Sementsov-Ogievskiy
2019-04-08 16:26 ` [Qemu-devel] [PATCH v2 1/2] tests/perf: Test lseek influence on qcow2 block-status Vladimir Sementsov-Ogievskiy
2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
2019-04-08 16:26 ` [Qemu-devel] [PATCH v2 2/2] block: avoid recursive block_status call if possible Vladimir Sementsov-Ogievskiy
2019-04-08 16:26   ` Vladimir Sementsov-Ogievskiy
2019-05-22 10:32   ` Kevin Wolf
2019-05-27 15:13   ` Max Reitz
2019-05-28  6:39     ` Vladimir Sementsov-Ogievskiy
2019-05-28 15:47       ` Max Reitz
2019-04-22  9:58 ` [Qemu-devel] [PATCH v2 for 4.1 0/2] avoid lseek on block_status Vladimir Sementsov-Ogievskiy
2019-04-22  9:58   ` Vladimir Sementsov-Ogievskiy
2019-05-06 15:14 ` Vladimir Sementsov-Ogievskiy
2019-05-21  7:50 ` [Qemu-devel] ping " Vladimir Sementsov-Ogievskiy
2019-05-22 10:47 ` [Qemu-devel] " Kevin Wolf
2019-05-22 11:30   ` Vladimir Sementsov-Ogievskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.