All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] quorum: Implement bdrv_co_block_status()
@ 2020-11-11 16:53 Alberto Garcia
  2020-11-11 16:53 ` [PATCH v3 1/2] " Alberto Garcia
  2020-11-11 16:53 ` [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes() Alberto Garcia
  0 siblings, 2 replies; 9+ messages in thread
From: Alberto Garcia @ 2020-11-11 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Tao Xu, Alberto Garcia, qemu-block, Max Reitz

Hi,

I realized that if one of the children returns an error during
co_block_status() then we should not pass the error immediately to the
caller, because that's a situation that Quorum should be able to
handle.

The new version of the patch takes the simpler approach of falling
back to returning BDRV_BLOCK_DATA, as if Quorum did not implement
bdrv_co_block_status(). This will force the caller to try to read the
actual data and the normal Quorum voting and error handling process
will be used.

Berto

v3:
- Fall back to BDRV_BLOCK_DATA if a child returns an error.

v2: https://lists.gnu.org/archive/html/qemu-block/2020-11/msg00259.html
- Implement bdrv_co_pwrite_zeroes() for quorum

v1: https://lists.gnu.org/archive/html/qemu-block/2020-11/msg00163.html


Alberto Garcia (2):
  quorum: Implement bdrv_co_block_status()
  quorum: Implement bdrv_co_pwrite_zeroes()

 block/quorum.c             |  70 ++++++++++++++++-
 tests/qemu-iotests/312     | 155 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/312.out |  71 +++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 4 files changed, 295 insertions(+), 2 deletions(-)
 create mode 100755 tests/qemu-iotests/312
 create mode 100644 tests/qemu-iotests/312.out

-- 
2.20.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] quorum: Implement bdrv_co_block_status()
  2020-11-11 16:53 [PATCH v3 0/2] quorum: Implement bdrv_co_block_status() Alberto Garcia
@ 2020-11-11 16:53 ` Alberto Garcia
  2020-11-13 11:36   ` Max Reitz
  2020-11-11 16:53 ` [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes() Alberto Garcia
  1 sibling, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2020-11-11 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Tao Xu, Alberto Garcia, qemu-block, Max Reitz

The quorum driver does not implement bdrv_co_block_status() and
because of that it always reports to contain data even if all its
children are known to be empty.

One consequence of this is that if we for example create a quorum with
a size of 10GB and we mirror it to a new image the operation will
write 10GB of actual zeroes to the destination image wasting a lot of
time and disk space.

Since a quorum has an arbitrary number of children of potentially
different formats there is no way to report all possible allocation
status flags in a way that makes sense, so this implementation only
reports when a given region is known to contain zeroes
(BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA).

If all children agree that a region contains zeroes then we can return
BDRV_BLOCK_ZERO using the smallest size reported by the children
(because all agree that a region of at least that size contains
zeroes).

If at least one child disagrees we have to return BDRV_BLOCK_DATA.
In this case we use the largest of the sizes reported by the children
that didn't return BDRV_BLOCK_ZERO (because we know that there won't
be an agreement for at least that size).

Signed-off-by: Alberto Garcia <berto@igalia.com>
Tested-by: Tao Xu <tao3.xu@intel.com>
---
 block/quorum.c             |  52 +++++++++++++
 tests/qemu-iotests/312     | 148 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/312.out |  67 +++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 4 files changed, 268 insertions(+)
 create mode 100755 tests/qemu-iotests/312
 create mode 100644 tests/qemu-iotests/312.out

diff --git a/block/quorum.c b/block/quorum.c
index e846a7e892..9691a9bee9 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -18,6 +18,7 @@
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "block/block_int.h"
+#include "block/coroutines.h"
 #include "block/qdict.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-block.h"
@@ -1174,6 +1175,56 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
              | DEFAULT_PERM_UNCHANGED;
 }
 
+/*
+ * Each one of the children can report different status flags even
+ * when they contain the same data, so what this function does is
+ * return BDRV_BLOCK_ZERO if *all* children agree that a certain
+ * region contains zeroes, and BDRV_BLOCK_DATA otherwise.
+ */
+static int coroutine_fn quorum_co_block_status(BlockDriverState *bs,
+                                               bool want_zero,
+                                               int64_t offset, int64_t count,
+                                               int64_t *pnum, int64_t *map,
+                                               BlockDriverState **file)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int i, ret;
+    int64_t pnum_zero = count;
+    int64_t pnum_data = 0;
+
+    for (i = 0; i < s->num_children; i++) {
+        int64_t bytes;
+        ret = bdrv_co_common_block_status_above(s->children[i]->bs, NULL, false,
+                                                want_zero, offset, count,
+                                                &bytes, NULL, NULL, NULL);
+        if (ret < 0) {
+            quorum_report_bad(QUORUM_OP_TYPE_READ, offset, count,
+                              s->children[i]->bs->node_name, ret);
+            pnum_data = count;
+            break;
+        }
+        /*
+         * Even if all children agree about whether there are zeroes
+         * or not at @offset they might disagree on the size, so use
+         * the smallest when reporting BDRV_BLOCK_ZERO and the largest
+         * when reporting BDRV_BLOCK_DATA.
+         */
+        if (ret & BDRV_BLOCK_ZERO) {
+            pnum_zero = MIN(pnum_zero, bytes);
+        } else {
+            pnum_data = MAX(pnum_data, bytes);
+        }
+    }
+
+    if (pnum_data) {
+        *pnum = pnum_data;
+        return BDRV_BLOCK_DATA;
+    } else {
+        *pnum = pnum_zero;
+        return BDRV_BLOCK_ZERO;
+    }
+}
+
 static const char *const quorum_strong_runtime_opts[] = {
     QUORUM_OPT_VOTE_THRESHOLD,
     QUORUM_OPT_BLKVERIFY,
@@ -1192,6 +1243,7 @@ static BlockDriver bdrv_quorum = {
     .bdrv_close                         = quorum_close,
     .bdrv_gather_child_options          = quorum_gather_child_options,
     .bdrv_dirname                       = quorum_dirname,
+    .bdrv_co_block_status               = quorum_co_block_status,
 
     .bdrv_co_flush_to_disk              = quorum_co_flush,
 
diff --git a/tests/qemu-iotests/312 b/tests/qemu-iotests/312
new file mode 100755
index 0000000000..1b08f1552f
--- /dev/null
+++ b/tests/qemu-iotests/312
@@ -0,0 +1,148 @@
+#!/usr/bin/env bash
+#
+# Test drive-mirror with quorum
+#
+# The goal of this test is to check how the quorum driver reports
+# regions that are known to read as zeroes (BDRV_BLOCK_ZERO). The idea
+# is that drive-mirror will try the efficient representation of zeroes
+# in the destination image instead of writing actual zeroes.
+#
+# Copyright (C) 2020 Igalia, S.L.
+# Author: Alberto Garcia <berto@igalia.com>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=berto@igalia.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _rm_test_img "$TEST_IMG.0"
+    _rm_test_img "$TEST_IMG.1"
+    _rm_test_img "$TEST_IMG.2"
+    _rm_test_img "$TEST_IMG.3"
+    _cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+_unsupported_imgopts cluster_size data_file
+
+echo
+echo '### Create all images' # three source (quorum), one destination
+echo
+TEST_IMG="$TEST_IMG.0" _make_test_img -o cluster_size=64k 10M
+TEST_IMG="$TEST_IMG.1" _make_test_img -o cluster_size=64k 10M
+TEST_IMG="$TEST_IMG.2" _make_test_img -o cluster_size=64k 10M
+TEST_IMG="$TEST_IMG.3" _make_test_img -o cluster_size=64k 10M
+
+quorum="driver=raw,file.driver=quorum,file.vote-threshold=2"
+quorum="$quorum,file.children.0.file.filename=$TEST_IMG.0"
+quorum="$quorum,file.children.1.file.filename=$TEST_IMG.1"
+quorum="$quorum,file.children.2.file.filename=$TEST_IMG.2"
+quorum="$quorum,file.children.0.driver=$IMGFMT"
+quorum="$quorum,file.children.1.driver=$IMGFMT"
+quorum="$quorum,file.children.2.driver=$IMGFMT"
+
+echo
+echo '### Output of qemu-img map (empty quorum)'
+echo
+$QEMU_IMG map --image-opts $quorum | _filter_qemu_img_map
+
+# Now we write data to the quorum. All three images will read as
+# zeroes in all cases, but with different ways to represent them
+# (unallocated clusters, zero clusters, data clusters with zeroes)
+# that will have an effect on how the data will be mirrored and the
+# output of qemu-img map on the resulting image.
+echo
+echo '### Write data to the quorum'
+echo
+# Test 1: data regions surrounded by unallocated clusters.
+# Three data regions, the largest one (0x30000) will be picked, end result:
+# offset 0x10000, length 0x30000 -> data
+$QEMU_IO -c "write -P 0 $((0x10000)) $((0x10000))" "$TEST_IMG.0" | _filter_qemu_io
+$QEMU_IO -c "write -P 0 $((0x10000)) $((0x30000))" "$TEST_IMG.1" | _filter_qemu_io
+$QEMU_IO -c "write -P 0 $((0x10000)) $((0x20000))" "$TEST_IMG.2" | _filter_qemu_io
+
+# Test 2: zero regions surrounded by data clusters.
+# First we allocate the data clusters.
+$QEMU_IO -c "open -o $quorum" -c "write -P 0 $((0x100000)) $((0x40000))" | _filter_qemu_io
+
+# Three zero regions, the smallest one (0x10000) will be picked, end result:
+# offset 0x100000, length 0x10000 -> data
+# offset 0x110000, length 0x10000 -> zeroes
+# offset 0x120000, length 0x20000 -> data
+$QEMU_IO -c "write -z $((0x110000)) $((0x10000))" "$TEST_IMG.0" | _filter_qemu_io
+$QEMU_IO -c "write -z $((0x110000)) $((0x30000))" "$TEST_IMG.1" | _filter_qemu_io
+$QEMU_IO -c "write -z $((0x110000)) $((0x20000))" "$TEST_IMG.2" | _filter_qemu_io
+
+# Test 3: zero clusters surrounded by unallocated clusters.
+# Everything reads as zeroes, no effect on the end result.
+$QEMU_IO -c "write -z $((0x150000)) $((0x10000))" "$TEST_IMG.0" | _filter_qemu_io
+$QEMU_IO -c "write -z $((0x150000)) $((0x30000))" "$TEST_IMG.1" | _filter_qemu_io
+$QEMU_IO -c "write -z $((0x150000)) $((0x20000))" "$TEST_IMG.2" | _filter_qemu_io
+
+# Test 4: mix of data and zero clusters.
+# The zero region will be ignored in favor of the largest data region
+# (0x20000), end result:
+# offset 0x200000, length 0x20000 -> data
+$QEMU_IO -c "write -P 0 $((0x200000)) $((0x10000))" "$TEST_IMG.0" | _filter_qemu_io
+$QEMU_IO -c "write -z   $((0x200000)) $((0x30000))" "$TEST_IMG.1" | _filter_qemu_io
+$QEMU_IO -c "write -P 0 $((0x200000)) $((0x20000))" "$TEST_IMG.2" | _filter_qemu_io
+
+echo
+echo '### Launch the drive-mirror job'
+echo
+qemu_comm_method="qmp" _launch_qemu -drive if=virtio,"$quorum"
+h=$QEMU_HANDLE
+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
+
+_send_qemu_cmd $h \
+    "{'execute': 'drive-mirror',
+                 'arguments': {'device': 'virtio0',
+                               'format': '$IMGFMT',
+                               'target': '$TEST_IMG.3',
+                               'sync':   'full',
+                               'mode':   'existing' }}"    \
+     "BLOCK_JOB_READY.*virtio0"
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'block-job-complete',
+       'arguments': { 'device': 'virtio0' } }" \
+    'BLOCK_JOB_COMPLETED'
+
+_send_qemu_cmd $h "{ 'execute': 'quit' }" ''
+
+echo
+echo '### Output of qemu-img map (destination image)'
+echo
+$QEMU_IMG map "$TEST_IMG.3" | _filter_qemu_img_map
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/312.out b/tests/qemu-iotests/312.out
new file mode 100644
index 0000000000..4ae749175b
--- /dev/null
+++ b/tests/qemu-iotests/312.out
@@ -0,0 +1,67 @@
+QA output created by 312
+
+### Create all images
+
+Formatting 'TEST_DIR/t.IMGFMT.0', fmt=IMGFMT size=10485760
+Formatting 'TEST_DIR/t.IMGFMT.1', fmt=IMGFMT size=10485760
+Formatting 'TEST_DIR/t.IMGFMT.2', fmt=IMGFMT size=10485760
+Formatting 'TEST_DIR/t.IMGFMT.3', fmt=IMGFMT size=10485760
+
+### Output of qemu-img map (empty quorum)
+
+Offset          Length          File
+
+### Write data to the quorum
+
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 65536
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 131072/131072 bytes at offset 65536
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 262144/262144 bytes at offset 1048576
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 1114112
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 1114112
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 131072/131072 bytes at offset 1114112
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 1376256
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 1376256
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 131072/131072 bytes at offset 1376256
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 2097152
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 2097152
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 131072/131072 bytes at offset 2097152
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+### Launch the drive-mirror job
+
+{ 'execute': 'qmp_capabilities' }
+{"return": {}}
+{'execute': 'drive-mirror', 'arguments': {'device': 'virtio0', 'format': 'IMGFMT', 'target': 'TEST_DIR/t.IMGFMT.3', 'sync': 'full', 'mode': 'existing' }}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "virtio0"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "virtio0"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "virtio0"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "virtio0", "len": 10485760, "offset": 10485760, "speed": 0, "type": "mirror"}}
+{ 'execute': 'block-job-complete', 'arguments': { 'device': 'virtio0' } }
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "virtio0"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "virtio0"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "virtio0", "len": 10485760, "offset": 10485760, "speed": 0, "type": "mirror"}}
+{ 'execute': 'quit' }
+
+### Output of qemu-img map (destination image)
+
+Offset          Length          File
+0x10000         0x30000         TEST_DIR/t.IMGFMT.3
+0x100000        0x10000         TEST_DIR/t.IMGFMT.3
+0x120000        0x20000         TEST_DIR/t.IMGFMT.3
+0x200000        0x20000         TEST_DIR/t.IMGFMT.3
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 2960dff728..2a218c8918 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -316,3 +316,4 @@
 305 rw quick
 307 rw quick export
 309 rw auto quick
+312 rw auto quick
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes()
  2020-11-11 16:53 [PATCH v3 0/2] quorum: Implement bdrv_co_block_status() Alberto Garcia
  2020-11-11 16:53 ` [PATCH v3 1/2] " Alberto Garcia
@ 2020-11-11 16:53 ` Alberto Garcia
  2020-11-13 11:49   ` Max Reitz
  1 sibling, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2020-11-11 16:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Tao Xu, Alberto Garcia, qemu-block, Max Reitz

This simply calls bdrv_co_pwrite_zeroes() in all children

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c             | 18 ++++++++++++++++--
 tests/qemu-iotests/312     |  7 +++++++
 tests/qemu-iotests/312.out |  4 ++++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 9691a9bee9..c81572f513 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -692,8 +692,13 @@ static void write_quorum_entry(void *opaque)
     QuorumChildRequest *sacb = &acb->qcrs[i];
 
     sacb->bs = s->children[i]->bs;
-    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
-                                acb->qiov, acb->flags);
+    if (acb->flags & BDRV_REQ_ZERO_WRITE) {
+        sacb->ret = bdrv_co_pwrite_zeroes(s->children[i], acb->offset,
+                                          acb->bytes, acb->flags);
+    } else {
+        sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
+                                    acb->qiov, acb->flags);
+    }
     if (sacb->ret == 0) {
         acb->success_count++;
     } else {
@@ -739,6 +744,14 @@ static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset,
     return ret;
 }
 
+static int quorum_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+                                   int bytes, BdrvRequestFlags flags)
+
+{
+    return quorum_co_pwritev(bs, offset, bytes, NULL,
+                             flags | BDRV_REQ_ZERO_WRITE);
+}
+
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
     BDRVQuorumState *s = bs->opaque;
@@ -1251,6 +1264,7 @@ static BlockDriver bdrv_quorum = {
 
     .bdrv_co_preadv                     = quorum_co_preadv,
     .bdrv_co_pwritev                    = quorum_co_pwritev,
+    .bdrv_co_pwrite_zeroes              = quorum_co_pwrite_zeroes,
 
     .bdrv_add_child                     = quorum_add_child,
     .bdrv_del_child                     = quorum_del_child,
diff --git a/tests/qemu-iotests/312 b/tests/qemu-iotests/312
index 1b08f1552f..93046393e7 100755
--- a/tests/qemu-iotests/312
+++ b/tests/qemu-iotests/312
@@ -114,6 +114,13 @@ $QEMU_IO -c "write -P 0 $((0x200000)) $((0x10000))" "$TEST_IMG.0" | _filter_qemu
 $QEMU_IO -c "write -z   $((0x200000)) $((0x30000))" "$TEST_IMG.1" | _filter_qemu_io
 $QEMU_IO -c "write -P 0 $((0x200000)) $((0x20000))" "$TEST_IMG.2" | _filter_qemu_io
 
+# Test 5: write data to a region and then zeroize it, doing it
+# directly on the quorum device instead of the individual images.
+# This has no effect on the end result but proves that the quorum driver
+# supports 'write -z'.
+$QEMU_IO -c "open -o $quorum" -c "write $((0x250000)) $((0x10000))" | _filter_qemu_io
+$QEMU_IO -c "open -o $quorum" -c "write -z $((0x250000)) $((0x10000))" | _filter_qemu_io
+
 echo
 echo '### Launch the drive-mirror job'
 echo
diff --git a/tests/qemu-iotests/312.out b/tests/qemu-iotests/312.out
index 4ae749175b..778dda95c7 100644
--- a/tests/qemu-iotests/312.out
+++ b/tests/qemu-iotests/312.out
@@ -39,6 +39,10 @@ wrote 196608/196608 bytes at offset 2097152
 192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 131072/131072 bytes at offset 2097152
 128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 2424832
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 2424832
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
 ### Launch the drive-mirror job
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] quorum: Implement bdrv_co_block_status()
  2020-11-11 16:53 ` [PATCH v3 1/2] " Alberto Garcia
@ 2020-11-13 11:36   ` Max Reitz
  0 siblings, 0 replies; 9+ messages in thread
From: Max Reitz @ 2020-11-13 11:36 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Tao Xu, qemu-block

On 11.11.20 17:53, Alberto Garcia wrote:
> The quorum driver does not implement bdrv_co_block_status() and
> because of that it always reports to contain data even if all its
> children are known to be empty.
> 
> One consequence of this is that if we for example create a quorum with
> a size of 10GB and we mirror it to a new image the operation will
> write 10GB of actual zeroes to the destination image wasting a lot of
> time and disk space.
> 
> Since a quorum has an arbitrary number of children of potentially
> different formats there is no way to report all possible allocation
> status flags in a way that makes sense, so this implementation only
> reports when a given region is known to contain zeroes
> (BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA).
> 
> If all children agree that a region contains zeroes then we can return
> BDRV_BLOCK_ZERO using the smallest size reported by the children
> (because all agree that a region of at least that size contains
> zeroes).
> 
> If at least one child disagrees we have to return BDRV_BLOCK_DATA.
> In this case we use the largest of the sizes reported by the children
> that didn't return BDRV_BLOCK_ZERO (because we know that there won't
> be an agreement for at least that size).
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> Tested-by: Tao Xu <tao3.xu@intel.com>
> ---
>   block/quorum.c             |  52 +++++++++++++
>   tests/qemu-iotests/312     | 148 +++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/312.out |  67 +++++++++++++++++
>   tests/qemu-iotests/group   |   1 +
>   4 files changed, 268 insertions(+)
>   create mode 100755 tests/qemu-iotests/312
>   create mode 100644 tests/qemu-iotests/312.out

Reviewed-by: Max Reitz <mreitz@redhat.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes()
  2020-11-11 16:53 ` [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes() Alberto Garcia
@ 2020-11-13 11:49   ` Max Reitz
  2020-11-13 16:07     ` Alberto Garcia
  0 siblings, 1 reply; 9+ messages in thread
From: Max Reitz @ 2020-11-13 11:49 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Tao Xu, qemu-block

On 11.11.20 17:53, Alberto Garcia wrote:
> This simply calls bdrv_co_pwrite_zeroes() in all children
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/quorum.c             | 18 ++++++++++++++++--
>   tests/qemu-iotests/312     |  7 +++++++
>   tests/qemu-iotests/312.out |  4 ++++
>   3 files changed, 27 insertions(+), 2 deletions(-)

Should we set supported_zero_flags to something?  I think we can at 
least set BDRV_REQ_NO_FALLBACK.  We could also try BDRV_REQ_FUA.

> diff --git a/block/quorum.c b/block/quorum.c
> index 9691a9bee9..c81572f513 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -692,8 +692,13 @@ static void write_quorum_entry(void *opaque)
>       QuorumChildRequest *sacb = &acb->qcrs[i];
>   
>       sacb->bs = s->children[i]->bs;
> -    sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
> -                                acb->qiov, acb->flags);
> +    if (acb->flags & BDRV_REQ_ZERO_WRITE) {
> +        sacb->ret = bdrv_co_pwrite_zeroes(s->children[i], acb->offset,
> +                                          acb->bytes, acb->flags);
> +    } else {
> +        sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
> +                                    acb->qiov, acb->flags);
> +    }

Seems unnecessary (bdrv_co_pwritev() can handle BDRV_REQ_ZERO_WRITE), 
but perhaps it’s good to be explicit.

>       if (sacb->ret == 0) {
>           acb->success_count++;
>       } else {

[...]

> diff --git a/tests/qemu-iotests/312 b/tests/qemu-iotests/312
> index 1b08f1552f..93046393e7 100755
> --- a/tests/qemu-iotests/312
> +++ b/tests/qemu-iotests/312
> @@ -114,6 +114,13 @@ $QEMU_IO -c "write -P 0 $((0x200000)) $((0x10000))" "$TEST_IMG.0" | _filter_qemu
>   $QEMU_IO -c "write -z   $((0x200000)) $((0x30000))" "$TEST_IMG.1" | _filter_qemu_io
>   $QEMU_IO -c "write -P 0 $((0x200000)) $((0x20000))" "$TEST_IMG.2" | _filter_qemu_io
>   
> +# Test 5: write data to a region and then zeroize it, doing it
> +# directly on the quorum device instead of the individual images.
> +# This has no effect on the end result but proves that the quorum driver
> +# supports 'write -z'.
> +$QEMU_IO -c "open -o $quorum" -c "write $((0x250000)) $((0x10000))" | _filter_qemu_io
> +$QEMU_IO -c "open -o $quorum" -c "write -z $((0x250000)) $((0x10000))" | _filter_qemu_io
> +

My gut would have preferred a test where the data region is larger than 
the zeroed region (so we can see that the first write has done 
something), but who cares about my gut.

I don’t mind not setting supported_zero_flags enough to warrant 
withholding a

Reviewed-by: Max Reitz <mreitz@redhat.com>

But I’ll give you some time to reply before I’d take this patch to 
block-next.  (That is, unless Kevin takes it during my two-week PTO...)

>   echo
>   echo '### Launch the drive-mirror job'
>   echo



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes()
  2020-11-13 11:49   ` Max Reitz
@ 2020-11-13 16:07     ` Alberto Garcia
  2020-11-13 16:11       ` Max Reitz
  0 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2020-11-13 16:07 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Tao Xu, qemu-block

On Fri 13 Nov 2020 12:49:04 PM CET, Max Reitz wrote:
> On 11.11.20 17:53, Alberto Garcia wrote:
>> This simply calls bdrv_co_pwrite_zeroes() in all children
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   block/quorum.c             | 18 ++++++++++++++++--
>>   tests/qemu-iotests/312     |  7 +++++++
>>   tests/qemu-iotests/312.out |  4 ++++
>>   3 files changed, 27 insertions(+), 2 deletions(-)
>
> Should we set supported_zero_flags to something?  I think we can at 
> least set BDRV_REQ_NO_FALLBACK.  We could also try BDRV_REQ_FUA.

We could set all supported_zero_flags as long as all children support
them, right? 

>> +    if (acb->flags & BDRV_REQ_ZERO_WRITE) {
>> +        sacb->ret = bdrv_co_pwrite_zeroes(s->children[i], acb->offset,
>> +                                          acb->bytes, acb->flags);
>> +    } else {
>> +        sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
>> +                                    acb->qiov, acb->flags);
>> +    }
>
> Seems unnecessary (bdrv_co_pwritev() can handle BDRV_REQ_ZERO_WRITE), 
> but perhaps it’s good to be explicit.

pwrite_zeroes() does this additionaly:

    if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
        flags &= ~BDRV_REQ_MAY_UNMAP;
    }

Berto


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes()
  2020-11-13 16:07     ` Alberto Garcia
@ 2020-11-13 16:11       ` Max Reitz
  2020-11-13 16:26         ` Alberto Garcia
  0 siblings, 1 reply; 9+ messages in thread
From: Max Reitz @ 2020-11-13 16:11 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Tao Xu, qemu-block

On 13.11.20 17:07, Alberto Garcia wrote:
> On Fri 13 Nov 2020 12:49:04 PM CET, Max Reitz wrote:
>> On 11.11.20 17:53, Alberto Garcia wrote:
>>> This simply calls bdrv_co_pwrite_zeroes() in all children
>>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>    block/quorum.c             | 18 ++++++++++++++++--
>>>    tests/qemu-iotests/312     |  7 +++++++
>>>    tests/qemu-iotests/312.out |  4 ++++
>>>    3 files changed, 27 insertions(+), 2 deletions(-)
>>
>> Should we set supported_zero_flags to something?  I think we can at
>> least set BDRV_REQ_NO_FALLBACK.  We could also try BDRV_REQ_FUA.
> 
> We could set all supported_zero_flags as long as all children support
> them, right?

Sure, I was just thinking that we could set these regardless of whether 
the children support them, because (on zero-writes) the block layer will 
figure out for us whether the child nodes support them. O:)

>>> +    if (acb->flags & BDRV_REQ_ZERO_WRITE) {
>>> +        sacb->ret = bdrv_co_pwrite_zeroes(s->children[i], acb->offset,
>>> +                                          acb->bytes, acb->flags);
>>> +    } else {
>>> +        sacb->ret = bdrv_co_pwritev(s->children[i], acb->offset, acb->bytes,
>>> +                                    acb->qiov, acb->flags);
>>> +    }
>>
>> Seems unnecessary (bdrv_co_pwritev() can handle BDRV_REQ_ZERO_WRITE),
>> but perhaps it’s good to be explicit.
> 
> pwrite_zeroes() does this additionaly:
> 
>      if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
>          flags &= ~BDRV_REQ_MAY_UNMAP;
>      }

Interesting.  Technically, Quorum doesn’t support that flag (in 
supported_zero_flags O:))), so it shouldn’t appear, but, er, well then.

Max



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes()
  2020-11-13 16:11       ` Max Reitz
@ 2020-11-13 16:26         ` Alberto Garcia
  2020-11-13 16:35           ` Max Reitz
  0 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2020-11-13 16:26 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Tao Xu, qemu-block

On Fri 13 Nov 2020 05:11:20 PM CET, Max Reitz wrote:

>> We could set all supported_zero_flags as long as all children support
>> them, right?
>
> Sure, I was just thinking that we could set these regardless of
> whether the children support them, because (on zero-writes) the block
> layer will figure out for us whether the child nodes support them. O:)

But it can happen that one child supports e.g. BDRV_REQ_NO_FALLBACK but
the rest don't. In this case I think the block layer should return
-ENOTSUP earlier without writing to the child(ren) that do support that
flag.

So Quorum's supported_zero_flags would be the logical and of all of its
children's flags, right?

I'm unsure about BDRV_REQ_WRITE_UNCHANGED, many filters set that on top
of the other flags, but when would a BDS not support this flag?

>> pwrite_zeroes() does this additionaly:
>> 
>>      if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
>>          flags &= ~BDRV_REQ_MAY_UNMAP;
>>      }
>
> Interesting.  Technically, Quorum doesn’t support that flag (in
> supported_zero_flags O:))), so it shouldn’t appear, but, er, well
> then.

It would with the change that I'm proposing above.

Berto


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes()
  2020-11-13 16:26         ` Alberto Garcia
@ 2020-11-13 16:35           ` Max Reitz
  0 siblings, 0 replies; 9+ messages in thread
From: Max Reitz @ 2020-11-13 16:35 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel; +Cc: Kevin Wolf, Tao Xu, qemu-block

On 13.11.20 17:26, Alberto Garcia wrote:
> On Fri 13 Nov 2020 05:11:20 PM CET, Max Reitz wrote:
> 
>>> We could set all supported_zero_flags as long as all children support
>>> them, right?
>>
>> Sure, I was just thinking that we could set these regardless of
>> whether the children support them, because (on zero-writes) the block
>> layer will figure out for us whether the child nodes support them. O:)
> 
> But it can happen that one child supports e.g. BDRV_REQ_NO_FALLBACK but
> the rest don't. In this case I think the block layer should return
> -ENOTSUP earlier without writing to the child(ren) that do support that
> flag.

That’s true.

> So Quorum's supported_zero_flags would be the logical and of all of its
> children's flags, right?

Yes.  (And so it would need to be recalculated whenever a child is added 
or removed.)

> I'm unsure about BDRV_REQ_WRITE_UNCHANGED, many filters set that on top
> of the other flags, but when would a BDS not support this flag?

The WRITE_UNCHANGED flag is basically just a hint for the permission 
system that for such a write, the WRITE_UNCHANGED permission is 
sufficient.  So drivers that can pass through *all* WRITE_UNCHANGED 
requests as they are (i.e., filter drivers) can support this write/zero 
flag and then pass that flag on.  This way, they are safe not to take 
the WRITE permission on their child unless their parent has taken the 
WRITE permission on them.

(Otherwise, they’d also have to take the WRITE permission if the parent 
only holds the WRITE_UNCHANGED permission.)

Supporting this flag doesn’t make sense for drivers that won’t be able 
to pass it on all the time.  For example, qcow2 generally cannot pass it 
on, because it still needs to perform WRITE_UNCHANGED requests as normal 
writes.  (WRITE_UNCHANGED comes from copy-on-read; the data will read 
the same, hence the _UNCHANGED, but it still needs to be allocated on 
the receiving format layer.)

(Technically, qcow2 could figure out whether the requests it generates 
from a WRITE_UNCHANGED request still result in unchanging writes in the 
protocol layer, but there is no reason to.  It needs the WRITE 
permission anyway, because there are going to be some non-unchanging 
writes it has to perform.  And if it holds the WRITE permission, there 
is no need to bother with the WRITE_UNCHANGED flag.)

>>> pwrite_zeroes() does this additionaly:
>>>
>>>       if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
>>>           flags &= ~BDRV_REQ_MAY_UNMAP;
>>>       }
>>
>> Interesting.  Technically, Quorum doesn’t support that flag (in
>> supported_zero_flags O:))), so it shouldn’t appear, but, er, well
>> then.
> 
> It would with the change that I'm proposing above.

Yes, I know.  Hence the “O:)”. O:)

Max



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-13 16:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-11 16:53 [PATCH v3 0/2] quorum: Implement bdrv_co_block_status() Alberto Garcia
2020-11-11 16:53 ` [PATCH v3 1/2] " Alberto Garcia
2020-11-13 11:36   ` Max Reitz
2020-11-11 16:53 ` [PATCH v3 2/2] quorum: Implement bdrv_co_pwrite_zeroes() Alberto Garcia
2020-11-13 11:49   ` Max Reitz
2020-11-13 16:07     ` Alberto Garcia
2020-11-13 16:11       ` Max Reitz
2020-11-13 16:26         ` Alberto Garcia
2020-11-13 16:35           ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.