qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v11 00/13] Apply COR-filter to the block-stream permanently
@ 2020-10-12 17:43 Andrey Shinkevich via
  2020-10-12 17:43 ` [PATCH v11 01/13] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
                   ` (12 more replies)
  0 siblings, 13 replies; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

The iotest case test_stream_parallel still does not pass after the
COR-filter is inserted into the backing chain. As the test case may not
be initialized, it does not make a sense and was removed again.

v11:
  04: Base node overlay is used instead of base.
  05: Base node overlay is used instead of base.
  06: New.
  07: New.
  08: New.
  09: The new BDS-member 'supported_read_flags' is applied.
  10: The 'base_metadata' variable renamed to 'base_unfiltered'.
  11: New.
  12: The backing-file argument is left in the QMP interface. Warning added.
  13: The BDRV_REQ_COPY_ON_READ removed from the stream_populate();
      The 'implicit' initialization moved back to COR-filter driver.
      Base node overlay is used instead of base.

The v8 Message-Id:
<1601383109-110988-1-git-send-email-andrey.shinkevich@virtuozzo.com>

Andrey Shinkevich (13):
  copy-on-read: Support preadv/pwritev_part functions
  copy-on-read: add filter append/drop functions
  qapi: add filter-node-name to block-stream
  copy-on-read: pass overlay base node name to COR driver
  copy-on-read: limit COR operations to base in COR driver
  block: modify the comment for BDRV_REQ_PREFETCH flag
  block: include supported_read_flags into BDS structure
  copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter
  copy-on-read: skip non-guest reads if no copy needed
  stream: skip filters when writing backing file name to QCOW2 header
  stream: mark backing-file argument as deprecated
  stream: remove unused backing-file name parameter
  block: apply COR-filter to block-stream jobs

 block/copy-on-read.c           | 171 ++++++++++++++++++++++++++++++++++++++---
 block/copy-on-read.h           |  35 +++++++++
 block/io.c                     |   3 +-
 block/monitor/block-hmp-cmds.c |   4 +-
 block/stream.c                 | 112 ++++++++++++++++-----------
 blockdev.c                     |  25 +++---
 docs/system/deprecated.rst     |   6 ++
 include/block/block.h          |   7 +-
 include/block/block_int.h      |  13 +++-
 qapi/block-core.json           |   6 ++
 tests/qemu-iotests/030         |  51 ++----------
 tests/qemu-iotests/030.out     |   4 +-
 tests/qemu-iotests/141.out     |   2 +-
 tests/qemu-iotests/245         |  19 +++--
 14 files changed, 324 insertions(+), 134 deletions(-)
 create mode 100644 block/copy-on-read.h

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v11 01/13] copy-on-read: Support preadv/pwritev_part functions
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-12 17:43 ` [PATCH v11 02/13] copy-on-read: add filter append/drop functions Andrey Shinkevich via
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-on-read.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 2816e61..cb03e0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs)
 }
 
 
-static int coroutine_fn cor_co_preadv(BlockDriverState *bs,
-                                      uint64_t offset, uint64_t bytes,
-                                      QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
+                                           uint64_t offset, uint64_t bytes,
+                                           QEMUIOVector *qiov,
+                                           size_t qiov_offset,
+                                           int flags)
 {
-    return bdrv_co_preadv(bs->file, offset, bytes, qiov,
-                          flags | BDRV_REQ_COPY_ON_READ);
+    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+                               flags | BDRV_REQ_COPY_ON_READ);
 }
 
 
-static int coroutine_fn cor_co_pwritev(BlockDriverState *bs,
-                                       uint64_t offset, uint64_t bytes,
-                                       QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
+                                            uint64_t offset,
+                                            uint64_t bytes,
+                                            QEMUIOVector *qiov,
+                                            size_t qiov_offset, int flags)
 {
-
-    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+    return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
+                                flags);
 }
 
 
@@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = {
 
     .bdrv_getlength                     = cor_getlength,
 
-    .bdrv_co_preadv                     = cor_co_preadv,
-    .bdrv_co_pwritev                    = cor_co_pwritev,
+    .bdrv_co_preadv_part                = cor_co_preadv_part,
+    .bdrv_co_pwritev_part               = cor_co_pwritev_part,
     .bdrv_co_pwrite_zeroes              = cor_co_pwrite_zeroes,
     .bdrv_co_pdiscard                   = cor_co_pdiscard,
     .bdrv_co_pwritev_compressed         = cor_co_pwritev_compressed,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 02/13] copy-on-read: add filter append/drop functions
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
  2020-10-12 17:43 ` [PATCH v11 01/13] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 10:44   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 03/13] qapi: add filter-node-name to block-stream Andrey Shinkevich via
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Provide API for the COR-filter insertion/removal.
Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/copy-on-read.h | 35 +++++++++++++++++++++
 2 files changed, 123 insertions(+)
 create mode 100644 block/copy-on-read.h

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f..bcccf0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -23,11 +23,21 @@
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "block/copy-on-read.h"
+
+
+typedef struct BDRVStateCOR {
+    bool active;
+} BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
+    BDRVStateCOR *state = bs->opaque;
+
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                                false, errp);
@@ -42,6 +52,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
         ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
             bs->file->bs->supported_zero_flags);
 
+    state->active = true;
+
+    /*
+     * We don't need to call bdrv_child_refresh_perms() now as the permissions
+     * will be updated later when the filter node gets its parent.
+     */
+
     return 0;
 }
 
@@ -57,6 +74,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild *c,
                            uint64_t perm, uint64_t shared,
                            uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVStateCOR *s = bs->opaque;
+
+    if (!s->active) {
+        /*
+         * While the filter is being removed
+         */
+        *nperm = 0;
+        *nshared = BLK_PERM_ALL;
+        return;
+    }
+
     *nperm = perm & PERM_PASSTHROUGH;
     *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -135,6 +163,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool locked)
 
 static BlockDriver bdrv_copy_on_read = {
     .format_name                        = "copy-on-read",
+    .instance_size                      = sizeof(BDRVStateCOR),
 
     .bdrv_open                          = cor_open,
     .bdrv_child_perm                    = cor_child_perm,
@@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
     bdrv_register(&bdrv_copy_on_read);
 }
 
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         QDict *node_options,
+                                         int flags, Error **errp)
+{
+    BlockDriverState *cor_filter_bs;
+    Error *local_err = NULL;
+
+    cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp);
+    if (cor_filter_bs == NULL) {
+        error_prepend(errp, "Could not create COR-filter node: ");
+        return NULL;
+    }
+
+    if (!qdict_get_try_str(node_options, "node-name")) {
+        cor_filter_bs->implicit = true;
+    }
+
+    bdrv_drained_begin(bs);
+    bdrv_replace_node(bs, cor_filter_bs, &local_err);
+    bdrv_drained_end(bs);
+
+    if (local_err) {
+        bdrv_unref(cor_filter_bs);
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    return cor_filter_bs;
+}
+
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
+{
+    BdrvChild *child;
+    BlockDriverState *bs;
+    BDRVStateCOR *s = cor_filter_bs->opaque;
+
+    child = bdrv_filter_child(cor_filter_bs);
+    if (!child) {
+        return;
+    }
+    bs = child->bs;
+
+    /* Retain the BDS until we complete the graph change. */
+    bdrv_ref(bs);
+    /* Hold a guest back from writing while permissions are being reset. */
+    bdrv_drained_begin(bs);
+    /* Drop permissions before the graph change. */
+    s->active = false;
+    bdrv_child_refresh_perms(cor_filter_bs, child, &error_abort);
+    bdrv_replace_node(cor_filter_bs, bs, &error_abort);
+
+    bdrv_drained_end(bs);
+    bdrv_unref(bs);
+    bdrv_unref(cor_filter_bs);
+}
+
+
 block_init(bdrv_copy_on_read_init);
diff --git a/block/copy-on-read.h b/block/copy-on-read.h
new file mode 100644
index 0000000..d6f2422
--- /dev/null
+++ b/block/copy-on-read.h
@@ -0,0 +1,35 @@
+/*
+ * Copy-on-read filter block driver
+ *
+ * The filter driver performs Copy-On-Read (COR) operations
+ *
+ * Copyright (c) 2018-2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *   Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef BLOCK_COPY_ON_READ
+#define BLOCK_COPY_ON_READ
+
+#include "block/block_int.h"
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         QDict *node_options,
+                                         int flags, Error **errp);
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs);
+
+#endif /* BLOCK_COPY_ON_READ */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 03/13] qapi: add filter-node-name to block-stream
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
  2020-10-12 17:43 ` [PATCH v11 01/13] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
  2020-10-12 17:43 ` [PATCH v11 02/13] copy-on-read: add filter append/drop functions Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-12 17:43 ` [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver Andrey Shinkevich via
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/monitor/block-hmp-cmds.c | 4 ++--
 block/stream.c                 | 4 +++-
 blockdev.c                     | 4 +++-
 include/block/block_int.h      | 7 ++++++-
 qapi/block-core.json           | 6 ++++++
 5 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4d3db5e..4e66775 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -507,8 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 
     qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
                      false, NULL, qdict_haskey(qdict, "speed"), speed, true,
-                     BLOCKDEV_ON_ERROR_REPORT, false, false, false, false,
-                     &error);
+                     BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, false,
+                     false, &error);
 
     hmp_handle_error(mon, error);
 }
diff --git a/block/stream.c b/block/stream.c
index 8ce6729..e0540ee 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = {
 void stream_start(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *base, const char *backing_file_str,
                   int creation_flags, int64_t speed,
-                  BlockdevOnError on_error, Error **errp)
+                  BlockdevOnError on_error,
+                  const char *filter_node_name,
+                  Error **errp)
 {
     StreamBlockJob *s;
     BlockDriverState *iter;
diff --git a/blockdev.c b/blockdev.c
index bebd3ba..d719c47 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2489,6 +2489,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
                       bool has_backing_file, const char *backing_file,
                       bool has_speed, int64_t speed,
                       bool has_on_error, BlockdevOnError on_error,
+                      bool has_filter_node_name, const char *filter_node_name,
                       bool has_auto_finalize, bool auto_finalize,
                       bool has_auto_dismiss, bool auto_dismiss,
                       Error **errp)
@@ -2571,7 +2572,8 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     }
 
     stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
-                 job_flags, has_speed ? speed : 0, on_error, &local_err);
+                 job_flags, has_speed ? speed : 0, on_error,
+                 filter_node_name, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto out;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 38cad9d..f782737 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1134,6 +1134,9 @@ int is_windows_drive(const char *filename);
  *                  See @BlockJobCreateFlags
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @on_error: The action to take upon error.
+ * @filter_node_name: The node name that should be assigned to the filter
+ * driver that the commit job inserts into the graph above @bs. NULL means
+ * that a node name should be autogenerated.
  * @errp: Error object.
  *
  * Start a streaming operation on @bs.  Clusters that are unallocated
@@ -1146,7 +1149,9 @@ int is_windows_drive(const char *filename);
 void stream_start(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *base, const char *backing_file_str,
                   int creation_flags, int64_t speed,
-                  BlockdevOnError on_error, Error **errp);
+                  BlockdevOnError on_error,
+                  const char *filter_node_name,
+                  Error **errp);
 
 /**
  * commit_start:
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 3c16f1e..32fb097 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2533,6 +2533,11 @@
 #            'stop' and 'enospc' can only be used if the block device
 #            supports io-status (see BlockInfo).  Since 1.3.
 #
+# @filter-node-name: the node name that should be assigned to the
+#                    filter driver that the stream job inserts into the graph
+#                    above @device. If this option is not given, a node name is
+#                    autogenerated. (Since: 5.2)
+#
 # @auto-finalize: When false, this job will wait in a PENDING state after it has
 #                 finished its work, waiting for @block-job-finalize before
 #                 making any block graph changes.
@@ -2563,6 +2568,7 @@
   'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
             '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
             '*on-error': 'BlockdevOnError',
+            '*filter-node-name': 'str',
             '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
 
 ##
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (2 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 03/13] qapi: add filter-node-name to block-stream Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 11:09   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 05/13] copy-on-read: limit COR operations to base in " Andrey Shinkevich via
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

We are going to use the COR-filter for a block-stream job.
To limit COR operations by the base node in the backing chain during
stream job, pass the name of overlay base node to the copy-on-read
driver as base node itself may change due to possible concurrent jobs.
The rest of the functionality will be implemented in the patch that
follows.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/copy-on-read.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index bcccf0f..c578b1b 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -24,19 +24,24 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qdict.h"
 #include "block/copy-on-read.h"
 
 
 typedef struct BDRVStateCOR {
     bool active;
+    BlockDriverState *base_overlay;
 } BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
+    BlockDriverState *base_overlay = NULL;
     BDRVStateCOR *state = bs->opaque;
+    /* We need the base overlay node rather than the base itself */
+    const char *base_overlay_node = qdict_get_try_str(options, "base");
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
         ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
             bs->file->bs->supported_zero_flags);
 
+    if (base_overlay_node) {
+        qdict_del(options, "base");
+        base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
+        if (!base_overlay) {
+            error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
+            return -EINVAL;
+        }
+    }
     state->active = true;
+    state->base_overlay = base_overlay;
 
     /*
      * We don't need to call bdrv_child_refresh_perms() now as the permissions
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (3 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 11:59   ` Max Reitz
  2020-10-14 12:01   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag Andrey Shinkevich via
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Limit COR operations by the base node in the backing chain when the
overlay base node name is given. It will be useful for a block stream
job when the COR-filter is applied. The overlay base node is passed as
the base itself may change due to concurrent commit jobs on the same
backing chain.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index c578b1b..dfbd6ad 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
                                            size_t qiov_offset,
                                            int flags)
 {
-    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
-                               flags | BDRV_REQ_COPY_ON_READ);
+    int64_t n = 0;
+    int64_t size = offset + bytes;
+    int local_flags;
+    int ret;
+    BDRVStateCOR *state = bs->opaque;
+
+    if (!state->base_overlay) {
+        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+                                   flags | BDRV_REQ_COPY_ON_READ);
+    }
+
+    while (offset < size) {
+        local_flags = flags;
+
+        /* In case of failure, try to copy-on-read anyway */
+        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
+        if (!ret) {
+            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
+                                          state->base_overlay, true, offset,
+                                          n, &n);
+            if (ret) {
+                local_flags |= BDRV_REQ_COPY_ON_READ;
+            }
+        }
+
+        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+                                  local_flags);
+        if (ret < 0) {
+            return ret;
+        }
+
+        offset += n;
+        qiov_offset += n;
+        bytes -= n;
+    }
+
+    return 0;
 }
 
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (4 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 05/13] copy-on-read: limit COR operations to base in " Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 12:22   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 07/13] block: include supported_read_flags into BDS structure Andrey Shinkevich via
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
use it alone and pass it to the COR-filter driver for further
processing.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 include/block/block.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 981ab5b..2b7efd1 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -71,9 +71,10 @@ typedef enum {
     BDRV_REQ_NO_FALLBACK        = 0x100,
 
     /*
-     * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
-     * on read request and means that caller doesn't really need data to be
-     * written to qiov parameter which may be NULL.
+     * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
+     * flag or when the COR-filter applied to read operations and means that
+     * caller doesn't really need data to be written to qiov parameter which
+     * may be NULL.
      */
     BDRV_REQ_PREFETCH  = 0x200,
     /* Mask of valid flags */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 07/13] block: include supported_read_flags into BDS structure
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (5 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 12:31   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter Andrey Shinkevich via
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Add the new member supported_read_flags to BlockDriverState structure.
It will control the BDRV_REQ_PREFETCH flag set for copy-on-read
operations.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 include/block/block_int.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index f782737..a142867 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -873,6 +873,10 @@ struct BlockDriverState {
     /* I/O Limits */
     BlockLimits bl;
 
+    /*
+     * Flags honored during pread (so far: BDRV_REQ_PREFETCH)
+     */
+    unsigned int supported_read_flags;
     /* Flags honored during pwrite (so far: BDRV_REQ_FUA,
      * BDRV_REQ_WRITE_UNCHANGED).
      * If a driver does not support BDRV_REQ_WRITE_UNCHANGED, those
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (6 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 07/13] block: include supported_read_flags into BDS structure Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 12:40   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed Andrey Shinkevich via
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Add support for the BDRV_REQ_PREFETCH flag to the supported_write_flags
of the COR-filter.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/copy-on-read.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index dfbd6ad..b136895 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -50,6 +50,7 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
+    bs->supported_read_flags = BDRV_REQ_PREFETCH;
     bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
         (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (7 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 12:51   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header Andrey Shinkevich via
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

If the flag BDRV_REQ_PREFETCH was set, pass it further to the
COR-driver to skip unneeded reading. It can be taken into account for
the COR-algorithms optimization. That check is being made during the
block stream job by the moment.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/copy-on-read.c | 13 +++++++++----
 block/io.c           |  3 ++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index b136895..278a11a 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
             }
         }
 
-        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
-                                  local_flags);
-        if (ret < 0) {
-            return ret;
+        if (!!(flags & BDRV_REQ_PREFETCH) &
+            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
+            /* Skip non-guest reads if no copy needed */
+        } else {
+            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
+                                      local_flags);
+            if (ret < 0) {
+                return ret;
+            }
         }
 
         offset += n;
diff --git a/block/io.c b/block/io.c
index 11df188..bff1808 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
 
     max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
     if (bytes <= max_bytes && bytes <= max_transfer) {
-        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
+        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
+                                 flags & bs->supported_read_flags);
         goto out;
     }
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (8 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 15:02   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 11/13] stream: mark backing-file argument as deprecated Andrey Shinkevich via
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Avoid writing a filter JSON-name to QCOW2 image when the backing file
is changed after the block stream job.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/stream.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index e0540ee..51462bd 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
     BlockDriverState *bs = blk_bs(bjob->blk);
     BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
     BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
+    BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
     Error *local_err = NULL;
     int ret = 0;
 
@@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
 
     if (bdrv_cow_child(unfiltered_bs)) {
         const char *base_id = NULL, *base_fmt = NULL;
-        if (base) {
-            base_id = s->backing_file_str;
-            if (base->drv) {
-                base_fmt = base->drv->format_name;
+        if (base_unfiltered) {
+            base_id = base_unfiltered->filename;
+            if (base_unfiltered->drv) {
+                base_fmt = base_unfiltered->drv->format_name;
             }
         }
         bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 11/13] stream: mark backing-file argument as deprecated
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (9 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 15:03   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 12/13] stream: remove unused backing-file name parameter Andrey Shinkevich via
  2020-10-12 17:43 ` [PATCH v11 13/13] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

Whereas the block-stream job starts using a backing file name of the
base node overlay after the block-stream job completes, mark the QMP
'backing-file' argument as deprecated.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 docs/system/deprecated.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 8b3ab5b..7491fcf 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -285,6 +285,12 @@ details.
 The ``query-events`` command has been superseded by the more powerful
 and accurate ``query-qmp-schema`` command.
 
+``block-stream`` argument ``backing-file`` (since 5.2)
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+The argument ``backing-file`` is deprecated. QEMU uses a backing file
+name of the base node overlay after the block-stream job completes.
+
 chardev client socket with ``wait`` option (since 4.0)
 ''''''''''''''''''''''''''''''''''''''''''''''''''''''
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 12/13] stream: remove unused backing-file name parameter
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (10 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 11/13] stream: mark backing-file argument as deprecated Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 15:05   ` Max Reitz
  2020-10-12 17:43 ` [PATCH v11 13/13] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

The 'backing-file' argument is not used by the block-stream job. It
designates a backing file name to set in QCOW2 image header after the
block-stream job finished. A backing file name of the node above base
is used instead.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/stream.c            |  6 +-----
 blockdev.c                | 21 ++++++---------------
 include/block/block_int.h |  2 +-
 3 files changed, 8 insertions(+), 21 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 51462bd..d3e1812 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -34,7 +34,6 @@ typedef struct StreamBlockJob {
     BlockDriverState *base_overlay; /* COW overlay (stream from this) */
     BlockDriverState *above_base;   /* Node directly above the base */
     BlockdevOnError on_error;
-    char *backing_file_str;
     bool bs_read_only;
     bool chain_frozen;
 } StreamBlockJob;
@@ -103,8 +102,6 @@ static void stream_clean(Job *job)
         blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
         bdrv_reopen_set_read_only(bs, true, NULL);
     }
-
-    g_free(s->backing_file_str);
 }
 
 static int coroutine_fn stream_run(Job *job, Error **errp)
@@ -220,7 +217,7 @@ static const BlockJobDriver stream_job_driver = {
 };
 
 void stream_start(const char *job_id, BlockDriverState *bs,
-                  BlockDriverState *base, const char *backing_file_str,
+                  BlockDriverState *base,
                   int creation_flags, int64_t speed,
                   BlockdevOnError on_error,
                   const char *filter_node_name,
@@ -295,7 +292,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 
     s->base_overlay = base_overlay;
     s->above_base = above_base;
-    s->backing_file_str = g_strdup(backing_file_str);
     s->bs_read_only = bs_read_only;
     s->chain_frozen = true;
 
diff --git a/blockdev.c b/blockdev.c
index d719c47..019b6e0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2498,7 +2498,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     BlockDriverState *base_bs = NULL;
     AioContext *aio_context;
     Error *local_err = NULL;
-    const char *base_name = NULL;
     int job_flags = JOB_DEFAULT;
 
     if (!has_on_error) {
@@ -2526,7 +2525,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
             goto out;
         }
         assert(bdrv_get_aio_context(base_bs) == aio_context);
-        base_name = base;
     }
 
     if (has_base_node) {
@@ -2541,7 +2539,11 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
         }
         assert(bdrv_get_aio_context(base_bs) == aio_context);
         bdrv_refresh_filename(base_bs);
-        base_name = base_bs->filename;
+    }
+
+    if (has_backing_file) {
+        warn_report("Use of \"backing-file\" argument is deprecated; "
+                    "a backing file of the node above base is used instead");
     }
 
     /* Check for op blockers in the whole chain between bs and base */
@@ -2553,17 +2555,6 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
         }
     }
 
-    /* if we are streaming the entire chain, the result will have no backing
-     * file, and specifying one is therefore an error */
-    if (base_bs == NULL && has_backing_file) {
-        error_setg(errp, "backing file specified, but streaming the "
-                         "entire chain");
-        goto out;
-    }
-
-    /* backing_file string overrides base bs filename */
-    base_name = has_backing_file ? backing_file : base_name;
-
     if (has_auto_finalize && !auto_finalize) {
         job_flags |= JOB_MANUAL_FINALIZE;
     }
@@ -2571,7 +2562,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
         job_flags |= JOB_MANUAL_DISMISS;
     }
 
-    stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
+    stream_start(has_job_id ? job_id : NULL, bs, base_bs,
                  job_flags, has_speed ? speed : 0, on_error,
                  filter_node_name, &local_err);
     if (local_err) {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a142867..4f523c3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1151,7 +1151,7 @@ int is_windows_drive(const char *filename);
  * BlockDriverState.
  */
 void stream_start(const char *job_id, BlockDriverState *bs,
-                  BlockDriverState *base, const char *backing_file_str,
+                  BlockDriverState *base,
                   int creation_flags, int64_t speed,
                   BlockdevOnError on_error,
                   const char *filter_node_name,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v11 13/13] block: apply COR-filter to block-stream jobs
  2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (11 preceding siblings ...)
  2020-10-12 17:43 ` [PATCH v11 12/13] stream: remove unused backing-file name parameter Andrey Shinkevich via
@ 2020-10-12 17:43 ` Andrey Shinkevich via
  2020-10-14 16:24   ` Max Reitz
  12 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich via @ 2020-10-12 17:43 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, fam, stefanha, armbru, jsnow,
	libvir-list, eblake, den, vsementsov, andrey.shinkevich

This patch completes the series with the COR-filter insertion for
block-stream operations. Adding the filter makes it possible for copied
regions to be discarded in backing files during the block-stream job,
what will reduce the disk overuse.
The COR-filter insertion incurs changes in the iotests case
245:test_block_stream_4 that reopens the backing chain during a
block-stream job. There are changes in the iotests #030 as well.
The iotests case 030:test_stream_parallel was deleted due to multiple
conflicts between the concurrent job operations over the same backing
chain. The base backing node for one job is the top node for another
job. It may change due to the filter node inserted into the backing
chain while both jobs are running. Another issue is that the parts of
the backing chain are being frozen by the running job and may not be
changed by the concurrent job when needed. The concept of the parallel
jobs with common nodes is considered vital no more.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/stream.c             | 93 +++++++++++++++++++++++++++++-----------------
 tests/qemu-iotests/030     | 51 +++----------------------
 tests/qemu-iotests/030.out |  4 +-
 tests/qemu-iotests/141.out |  2 +-
 tests/qemu-iotests/245     | 19 +++++++---
 5 files changed, 81 insertions(+), 88 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index d3e1812..93564db 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -17,8 +17,10 @@
 #include "block/blockjob_int.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qdict.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
 
 enum {
     /*
@@ -33,6 +35,8 @@ typedef struct StreamBlockJob {
     BlockJob common;
     BlockDriverState *base_overlay; /* COW overlay (stream from this) */
     BlockDriverState *above_base;   /* Node directly above the base */
+    BlockDriverState *cor_filter_bs;
+    BlockDriverState *target_bs;
     BlockdevOnError on_error;
     bool bs_read_only;
     bool chain_frozen;
@@ -43,8 +47,7 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
 {
     assert(bytes < SIZE_MAX);
 
-    return blk_co_preadv(blk, offset, bytes, NULL,
-                         BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH);
+    return blk_co_preadv(blk, offset, bytes, NULL, BDRV_REQ_PREFETCH);
 }
 
 static void stream_abort(Job *job)
@@ -52,23 +55,20 @@ static void stream_abort(Job *job)
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
     if (s->chain_frozen) {
-        BlockJob *bjob = &s->common;
-        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+        bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
     }
 }
 
 static int stream_prepare(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-    BlockJob *bjob = &s->common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
-    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
     BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
     BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
     Error *local_err = NULL;
     int ret = 0;
 
-    bdrv_unfreeze_backing_chain(bs, s->above_base);
+    bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->above_base);
     s->chain_frozen = false;
 
     if (bdrv_cow_child(unfiltered_bs)) {
@@ -94,13 +94,14 @@ static void stream_clean(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
+
+    bdrv_cor_filter_drop(s->cor_filter_bs);
 
     /* Reopen the image back in read-only mode if necessary */
     if (s->bs_read_only) {
         /* Give up write permissions before making it read-only */
         blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
-        bdrv_reopen_set_read_only(bs, true, NULL);
+        bdrv_reopen_set_read_only(s->target_bs, true, NULL);
     }
 }
 
@@ -108,9 +109,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockBackend *blk = s->common.blk;
-    BlockDriverState *bs = blk_bs(blk);
-    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
-    bool enable_cor = !bdrv_cow_child(s->base_overlay);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
     int64_t len;
     int64_t offset = 0;
     uint64_t delay_ns = 0;
@@ -122,21 +121,12 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
         return 0;
     }
 
-    len = bdrv_getlength(bs);
+    len = bdrv_getlength(s->target_bs);
     if (len < 0) {
         return len;
     }
     job_progress_set_remaining(&s->common.job, len);
 
-    /* Turn on copy-on-read for the whole block device so that guest read
-     * requests help us make progress.  Only do this when copying the entire
-     * backing chain since the copy-on-read operation does not take base into
-     * account.
-     */
-    if (enable_cor) {
-        bdrv_enable_copy_on_read(bs);
-    }
-
     for ( ; offset < len; offset += n) {
         bool copy;
         int ret;
@@ -195,10 +185,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
         }
     }
 
-    if (enable_cor) {
-        bdrv_disable_copy_on_read(bs);
-    }
-
     /* Do not remove the backing file if an error was there but ignored. */
     return error;
 }
@@ -228,6 +214,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     bool bs_read_only;
     int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
     BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
+    BlockDriverState *cor_filter_bs = NULL;
     BlockDriverState *above_base;
 
     if (!base_overlay) {
@@ -262,17 +249,48 @@ void stream_start(const char *job_id, BlockDriverState *bs,
         }
     }
 
-    /* Prevent concurrent jobs trying to modify the graph structure here, we
-     * already have our own plans. Also don't allow resize as the image size is
-     * queried only at the job start and then cached. */
-    s = block_job_create(job_id, &stream_job_driver, NULL, bs,
-                         basic_flags | BLK_PERM_GRAPH_MOD,
-                         basic_flags | BLK_PERM_WRITE,
+    QDict *opts = qdict_new();
+
+    qdict_put_str(opts, "driver", "copy-on-read");
+    qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+    if (base_overlay) {
+        /* Pass the base_overlay rather than base */
+        qdict_put_str(opts, "base", base_overlay->node_name);
+    }
+    if (filter_node_name) {
+        qdict_put_str(opts, "node-name", filter_node_name);
+    }
+
+    cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp);
+    if (cor_filter_bs == NULL) {
+        goto fail;
+    }
+
+    if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
+        bdrv_cor_filter_drop(cor_filter_bs);
+        cor_filter_bs = NULL;
+        goto fail;
+    }
+
+    s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs,
+                         BLK_PERM_CONSISTENT_READ,
+                         basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,
                          speed, creation_flags, NULL, NULL, errp);
     if (!s) {
         goto fail;
     }
 
+    /*
+     * Prevent concurrent jobs trying to modify the graph structure here, we
+     * already have our own plans. Also don't allow resize as the image size is
+     * queried only at the job start and then cached.
+     */
+    if (block_job_add_bdrv(&s->common, "active node", bs,
+                           basic_flags | BLK_PERM_GRAPH_MOD,
+                           basic_flags | BLK_PERM_WRITE, &error_abort)) {
+        goto fail;
+    }
+
     /* Block all intermediate nodes between bs and base, because they will
      * disappear from the chain after this operation. The streaming job reads
      * every block only once, assuming that it doesn't change, so forbid writes
@@ -292,6 +310,8 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 
     s->base_overlay = base_overlay;
     s->above_base = above_base;
+    s->cor_filter_bs = cor_filter_bs;
+    s->target_bs = bs;
     s->bs_read_only = bs_read_only;
     s->chain_frozen = true;
 
@@ -304,5 +324,10 @@ fail:
     if (bs_read_only) {
         bdrv_reopen_set_read_only(bs, true, NULL);
     }
-    bdrv_unfreeze_backing_chain(bs, above_base);
+    if (cor_filter_bs) {
+        bdrv_unfreeze_backing_chain(cor_filter_bs, above_base);
+        bdrv_cor_filter_drop(cor_filter_bs);
+    } else {
+        bdrv_unfreeze_backing_chain(bs, above_base);
+    }
 }
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index dcb4b5d..0064590 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -227,61 +227,20 @@ class TestParallelOps(iotests.QMPTestCase):
         for img in self.imgs:
             os.remove(img)
 
-    # Test that it's possible to run several block-stream operations
-    # in parallel in the same snapshot chain
-    @unittest.skipIf(os.environ.get('QEMU_CHECK_BLOCK_AUTO'), 'disabled in CI')
-    def test_stream_parallel(self):
-        self.assert_no_active_block_jobs()
-
-        # Check that the maps don't match before the streaming operations
-        for i in range(2, self.num_imgs, 2):
-            self.assertNotEqual(qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.imgs[i]),
-                                qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.imgs[i-1]),
-                                'image file map matches backing file before streaming')
-
-        # Create all streaming jobs
-        pending_jobs = []
-        for i in range(2, self.num_imgs, 2):
-            node_name = 'node%d' % i
-            job_id = 'stream-%s' % node_name
-            pending_jobs.append(job_id)
-            result = self.vm.qmp('block-stream', device=node_name, job_id=job_id, base=self.imgs[i-2], speed=1024)
-            self.assert_qmp(result, 'return', {})
-
-        for job in pending_jobs:
-            result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
-            self.assert_qmp(result, 'return', {})
-
-        # Wait for all jobs to be finished.
-        while len(pending_jobs) > 0:
-            for event in self.vm.get_qmp_events(wait=True):
-                if event['event'] == 'BLOCK_JOB_COMPLETED':
-                    job_id = self.dictpath(event, 'data/device')
-                    self.assertTrue(job_id in pending_jobs)
-                    self.assert_qmp_absent(event, 'data/error')
-                    pending_jobs.remove(job_id)
-
-        self.assert_no_active_block_jobs()
-        self.vm.shutdown()
-
-        # Check that all maps match now
-        for i in range(2, self.num_imgs, 2):
-            self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', self.imgs[i]),
-                             qemu_io('-f', iotests.imgfmt, '-c', 'map', self.imgs[i-1]),
-                             'image file map does not match backing file after streaming')
-
     # Test that it's not possible to perform two block-stream
     # operations if there are nodes involved in both.
     def test_overlapping_1(self):
         self.assert_no_active_block_jobs()
 
         # Set a speed limit to make sure that this job blocks the rest
-        result = self.vm.qmp('block-stream', device='node4', job_id='stream-node4', base=self.imgs[1], speed=1024*1024)
+        result = self.vm.qmp('block-stream', device='node4',
+                             job_id='stream-node4', base=self.imgs[1],
+                             filter_node_name='stream-filter', speed=1024*1024)
         self.assert_qmp(result, 'return', {})
 
         result = self.vm.qmp('block-stream', device='node5', job_id='stream-node5', base=self.imgs[2])
         self.assert_qmp(result, 'error/desc',
-            "Node 'node4' is busy: block device is in use by block job: stream")
+            "Node 'stream-filter' is busy: block device is in use by block job: stream")
 
         result = self.vm.qmp('block-stream', device='node3', job_id='stream-node3', base=self.imgs[2])
         self.assert_qmp(result, 'error/desc',
@@ -294,7 +253,7 @@ class TestParallelOps(iotests.QMPTestCase):
         # block-commit should also fail if it touches nodes used by the stream job
         result = self.vm.qmp('block-commit', device='drive0', base=self.imgs[4], job_id='commit-node4')
         self.assert_qmp(result, 'error/desc',
-            "Node 'node4' is busy: block device is in use by block job: stream")
+            "Node 'stream-filter' is busy: block device is in use by block job: stream")
 
         result = self.vm.qmp('block-commit', device='drive0', base=self.imgs[1], top=self.imgs[3], job_id='commit-node1')
         self.assert_qmp(result, 'error/desc',
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 6d9bee1..5eb508d 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-...........................
+..........................
 ----------------------------------------------------------------------
-Ran 27 tests
+Ran 26 tests
 
 OK
diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
index 08e0aec..028a16f 100644
--- a/tests/qemu-iotests/141.out
+++ b/tests/qemu-iotests/141.out
@@ -99,7 +99,7 @@ wrote 1048576/1048576 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
 {'execute': 'blockdev-del', 'arguments': {'node-name': 'drv0'}}
-{"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
+{"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block device is in use by block job: stream"}}
 {'execute': 'block-job-cancel', 'arguments': {'device': 'job0'}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index e60c832..940e85a 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -899,17 +899,26 @@ class TestBlockdevReopen(iotests.QMPTestCase):
         # make hd1 read-only and block-stream requires it to be read-write
         # (Which error message appears depends on whether the stream job is
         # already done with copying at this point.)
-        self.reopen(opts, {},
+        # As the COR-filter node is inserted into the backing chain with the
+        # 'block-stream' operation, we move the options to their proper nodes.
+        opts = hd_opts(1)
+        opts['backing'] = hd_opts(2)
+        opts['backing']['backing'] = None
+        self.reopen(opts, {'read-only': True},
             ["Can't set node 'hd1' to r/o with copy-on-read enabled",
              "Cannot make block node read-only, there is a writer on it"])
 
         # We can't remove hd2 while the stream job is ongoing
-        opts['backing']['backing'] = None
-        self.reopen(opts, {'backing.read-only': False}, "Cannot change 'backing' link from 'hd1' to 'hd2'")
+        opts['backing'] = None
+        self.reopen(opts, {'read-only': False},
+                    "Cannot change 'backing' link from 'hd1' to 'hd2'")
 
-        # We can detach hd1 from hd0 because it doesn't affect the stream job
+        # We can't detach hd1 from hd0 because there is the COR-filter implicit
+        # node in between.
+        opts = hd_opts(0)
         opts['backing'] = None
-        self.reopen(opts)
+        self.reopen(opts, {},
+                    "Cannot change backing link if 'hd0' has an implicit backing file")
 
         self.vm.run_job('stream0', auto_finalize = False, auto_dismiss = True)
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 02/13] copy-on-read: add filter append/drop functions
  2020-10-12 17:43 ` [PATCH v11 02/13] copy-on-read: add filter append/drop functions Andrey Shinkevich via
@ 2020-10-14 10:44   ` Max Reitz
  2020-10-14 14:28     ` Andrey Shinkevich
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 10:44 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 1981 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Provide API for the COR-filter insertion/removal.
> Also, drop the filter child permissions for an inactive state when the
> filter node is being removed.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/copy-on-read.h | 35 +++++++++++++++++++++
>  2 files changed, 123 insertions(+)
>  create mode 100644 block/copy-on-read.h
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index cb03e0f..bcccf0f 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c

[...]

> @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
>      bdrv_register(&bdrv_copy_on_read);
>  }
>  
> +
> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
> +                                         QDict *node_options,
> +                                         int flags, Error **errp)

I had hoped you could make this a generic block layer function. :(

(Because it really is rather generic)

*shrug*

Reviewed-by: Max Reitz <mreitz@redhat.com>

> +{
> +    BlockDriverState *cor_filter_bs;
> +    Error *local_err = NULL;
> +
> +    cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp);
> +    if (cor_filter_bs == NULL) {
> +        error_prepend(errp, "Could not create COR-filter node: ");
> +        return NULL;
> +    }
> +
> +    if (!qdict_get_try_str(node_options, "node-name")) {
> +        cor_filter_bs->implicit = true;
> +    }
> +
> +    bdrv_drained_begin(bs);
> +    bdrv_replace_node(bs, cor_filter_bs, &local_err);
> +    bdrv_drained_end(bs);
> +
> +    if (local_err) {
> +        bdrv_unref(cor_filter_bs);
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +
> +    return cor_filter_bs;
> +}


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-12 17:43 ` [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver Andrey Shinkevich via
@ 2020-10-14 11:09   ` Max Reitz
  2020-10-14 11:57     ` Max Reitz
  2020-10-14 16:08     ` Andrey Shinkevich
  0 siblings, 2 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 11:09 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 2728 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> We are going to use the COR-filter for a block-stream job.
> To limit COR operations by the base node in the backing chain during
> stream job, pass the name of overlay base node to the copy-on-read
> driver as base node itself may change due to possible concurrent jobs.
> The rest of the functionality will be implemented in the patch that
> follows.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Is there a reason why you didn’t add this option to QAPI (as part of a
yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.

> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index bcccf0f..c578b1b 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -24,19 +24,24 @@
>  #include "block/block_int.h"
>  #include "qemu/module.h"
>  #include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
>  #include "qapi/qmp/qdict.h"
>  #include "block/copy-on-read.h"
>  
>  
>  typedef struct BDRVStateCOR {
>      bool active;
> +    BlockDriverState *base_overlay;
>  } BDRVStateCOR;
>  
>  
>  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>                      Error **errp)
>  {
> +    BlockDriverState *base_overlay = NULL;
>      BDRVStateCOR *state = bs->opaque;
> +    /* We need the base overlay node rather than the base itself */
> +    const char *base_overlay_node = qdict_get_try_str(options, "base");

Shouldn’t it be called base-overlay or above-base then?

>  
>      bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>                                 BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
> @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>          ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>              bs->file->bs->supported_zero_flags);
>  
> +    if (base_overlay_node) {
> +        qdict_del(options, "base");
> +        base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);

I think this is a use-after-free.  The storage @base_overlay_node points
to belongs to a QString, which is referenced only by @options; so
deleting that element of @options should free that string.

Max

> +        if (!base_overlay) {
> +            error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
> +            return -EINVAL;
> +        }
> +    }
>      state->active = true;
> +    state->base_overlay = base_overlay;
>  
>      /*
>       * We don't need to call bdrv_child_refresh_perms() now as the permissions
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-14 11:09   ` Max Reitz
@ 2020-10-14 11:57     ` Max Reitz
  2020-10-14 14:56       ` Vladimir Sementsov-Ogievskiy
  2020-10-14 16:08     ` Andrey Shinkevich
  1 sibling, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 11:57 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 2009 bytes --]

On 14.10.20 13:09, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> We are going to use the COR-filter for a block-stream job.
>> To limit COR operations by the base node in the backing chain during
>> stream job, pass the name of overlay base node to the copy-on-read
>> driver as base node itself may change due to possible concurrent jobs.
>> The rest of the functionality will be implemented in the patch that
>> follows.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>  block/copy-on-read.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
> 
> Is there a reason why you didn’t add this option to QAPI (as part of a
> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
> 
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index bcccf0f..c578b1b 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -24,19 +24,24 @@
>>  #include "block/block_int.h"
>>  #include "qemu/module.h"
>>  #include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>>  #include "qapi/qmp/qdict.h"
>>  #include "block/copy-on-read.h"
>>  
>>  
>>  typedef struct BDRVStateCOR {
>>      bool active;
>> +    BlockDriverState *base_overlay;
>>  } BDRVStateCOR;
>>  
>>  
>>  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>                      Error **errp)
>>  {
>> +    BlockDriverState *base_overlay = NULL;
>>      BDRVStateCOR *state = bs->opaque;
>> +    /* We need the base overlay node rather than the base itself */
>> +    const char *base_overlay_node = qdict_get_try_str(options, "base");
> 
> Shouldn’t it be called base-overlay or above-base then?

While reviewing patch 5, I realized this sould indeed be base-overlay
(i.e. the next allocation-bearing layer above the base, not just a
filter on top of base), but that should be noted somewhere, of course.
Best would be the QAPI documentation for this option. O:)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-12 17:43 ` [PATCH v11 05/13] copy-on-read: limit COR operations to base in " Andrey Shinkevich via
@ 2020-10-14 11:59   ` Max Reitz
  2020-10-14 17:43     ` Andrey Shinkevich
  2020-10-14 12:01   ` Max Reitz
  1 sibling, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 11:59 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 2512 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>                                             size_t qiov_offset,
>                                             int flags)
>  {
> -    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -                               flags | BDRV_REQ_COPY_ON_READ);
> +    int64_t n = 0;
> +    int64_t size = offset + bytes;
> +    int local_flags;
> +    int ret;
> +    BDRVStateCOR *state = bs->opaque;
> +
> +    if (!state->base_overlay) {
> +        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> +                                   flags | BDRV_REQ_COPY_ON_READ);
> +    }
> +
> +    while (offset < size) {
> +        local_flags = flags;
> +
> +        /* In case of failure, try to copy-on-read anyway */
> +        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
> +        if (!ret) {

In case of failure, a negative value is going to be returned, we won’t
go into this conditional block, and local_flags isn’t going to contain
BDRV_REQ_COPY_ON_READ.

So the idea of CORing in case of failure sounds sound to me, but it
doesn’t look like that’s done.

> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),

I think this should either be bdrv_backing_chain_next() or we must rule
out the possibility of bs->file->bs being a filter somewhere.  I think
I’d prefer the former.

> +                                          state->base_overlay, true, offset,
> +                                          n, &n);
> +            if (ret) {

“ret == 1 || ret < 0” would be more explicit (and in line with the “!ret
|| ret < 0” probably needed above), but correct either way.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-12 17:43 ` [PATCH v11 05/13] copy-on-read: limit COR operations to base in " Andrey Shinkevich via
  2020-10-14 11:59   ` Max Reitz
@ 2020-10-14 12:01   ` Max Reitz
  2020-10-14 18:57     ` Andrey Shinkevich
  1 sibling, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 12:01 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 2762 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>                                             size_t qiov_offset,
>                                             int flags)
>  {
> -    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -                               flags | BDRV_REQ_COPY_ON_READ);
> +    int64_t n = 0;
> +    int64_t size = offset + bytes;

Just when I hit send I noticed that “end” would be a more fitting name
for this variable.

> +    int local_flags;
> +    int ret;
> +    BDRVStateCOR *state = bs->opaque;
> +
> +    if (!state->base_overlay) {
> +        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> +                                   flags | BDRV_REQ_COPY_ON_READ);
> +    }
> +
> +    while (offset < size) {

(because I got a bit confused looking at this)

(Though dropping @size (or @end) and just checking when @bytes becomes 0
should work, too.)

> +        local_flags = flags;
> +
> +        /* In case of failure, try to copy-on-read anyway */
> +        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
> +        if (!ret) {
> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
> +                                          state->base_overlay, true, offset,
> +                                          n, &n);
> +            if (ret) {
> +                local_flags |= BDRV_REQ_COPY_ON_READ;
> +            }
> +        }

Furthermore, I just noticed – can the is_allocated functions not return
0 in @n, when @offset is a the EOF?  Is that something to look out for?
 (I’m not sure.)

Max

> +
> +        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> +                                  local_flags);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        offset += n;
> +        qiov_offset += n;
> +        bytes -= n;
> +    }
> +
> +    return 0;
>  }
>  
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag
  2020-10-12 17:43 ` [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag Andrey Shinkevich via
@ 2020-10-14 12:22   ` Max Reitz
  2020-10-14 15:04     ` Vladimir Sementsov-Ogievskiy
  2020-10-14 19:57     ` Andrey Shinkevich
  0 siblings, 2 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 12:22 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 1993 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
> use it alone and pass it to the COR-filter driver for further
> processing.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  include/block/block.h | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 981ab5b..2b7efd1 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -71,9 +71,10 @@ typedef enum {
>      BDRV_REQ_NO_FALLBACK        = 0x100,
>  
>      /*
> -     * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
> -     * on read request and means that caller doesn't really need data to be
> -     * written to qiov parameter which may be NULL.
> +     * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
> +     * flag or when the COR-filter applied to read operations and means that

There’s some word missing here, but I’m not sure what it is...  At least
an “is” before “applied”.  Perhaps something like ”or when a COR filter
is involved (in read operations)” would be better.

> +     * caller doesn't really need data to be written to qiov parameter which

And this “written to” confused me for a second, because we’re reading
into qiov.  Technically, that means writing into the buffer, but, you know.

Could we rewrite the whole thing, perhaps?  Something like

“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
(i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a
COR filter), in which case it signals that the COR operation need not
read the data into memory (qiov), but only ensure it is copied to the
top layer (i.e., that COR is done).”

I don’t know.

Max

> +     * may be NULL.
>       */
>      BDRV_REQ_PREFETCH  = 0x200,
>      /* Mask of valid flags */
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 07/13] block: include supported_read_flags into BDS structure
  2020-10-12 17:43 ` [PATCH v11 07/13] block: include supported_read_flags into BDS structure Andrey Shinkevich via
@ 2020-10-14 12:31   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 12:31 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 1001 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Add the new member supported_read_flags to BlockDriverState structure.
> It will control the BDRV_REQ_PREFETCH flag set for copy-on-read
> operations.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  include/block/block_int.h | 4 ++++
>  1 file changed, 4 insertions(+)

Not sure what the problem with passing BDRV_REQ_PREFETCH to drivers that
aren’t the COR filter are would be, but:

Reviewed-by: Max Reitz <mreitz@redhat.com>

(I mean, outside of the context of COR the flag is undefined, so it can
be anything, but intuitively I’d assume it means “no need to read the
data to memory” then, too.  So if a driver doesn’t support it, nothing
bad should happen.  Hm.  Well, unless the driver passes the flag on
(unsuspectingly) and a metadata-bearing child doesn’t return any data
that the drviver needs.  Yeah, better to explicitly disallow the flag
for all unsupporting drivers then.)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter
  2020-10-12 17:43 ` [PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter Andrey Shinkevich via
@ 2020-10-14 12:40   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 12:40 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 1258 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Add support for the BDRV_REQ_PREFETCH flag to the supported_write_flags

s/write/read/

> of the COR-filter.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index dfbd6ad..b136895 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -50,6 +50,7 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>          return -EINVAL;
>      }
>  
> +    bs->supported_read_flags = BDRV_REQ_PREFETCH;
>      bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
>          (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);

Then we mustn’t let cor_co_preadv_part() pass the flag on to
bdrv_co_preadv_part() unless BDRV_REQ_COPY_ON_READ is set, too.  I
suspect the following patch is going to do that, but in the meantime the
code is wrong.

Perhaps just swap both patches?

And by the way, I’m also missing a patch that makes the block layer
evaluate supported_read_flags and e.g. strip BDRV_REQ_PREFETCH if it
isn’t supported, before it gets passed to such a block driver.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-12 17:43 ` [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed Andrey Shinkevich via
@ 2020-10-14 12:51   ` Max Reitz
  2020-10-14 15:22     ` Vladimir Sementsov-Ogievskiy
  2020-10-14 20:49     ` Andrey Shinkevich
  0 siblings, 2 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 12:51 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 2674 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
> COR-driver to skip unneeded reading. It can be taken into account for
> the COR-algorithms optimization. That check is being made during the
> block stream job by the moment.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 13 +++++++++----
>  block/io.c           |  3 ++-
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index b136895..278a11a 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>              }
>          }
>  
> -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> -                                  local_flags);
> -        if (ret < 0) {
> -            return ret;
> +        if (!!(flags & BDRV_REQ_PREFETCH) &

How about dropping the double negation and using a logical && instead of
the binary &?

> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
> +            /* Skip non-guest reads if no copy needed */
> +        } else {

Hm.  I would have just written the negated form

(!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))

and put the “skip” comment above that condition.

(Since local_flags is initialized to flags, it can be written as a
single comparison, but that’s a matter of taste and I’m not going to
recommend either over the other:

((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
BDRV_REQ_PREFETCH)

)

> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> +                                      local_flags);
> +            if (ret < 0) {
> +                return ret;
> +            }
>          }
>  
>          offset += n;
> diff --git a/block/io.c b/block/io.c
> index 11df188..bff1808 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>  
>      max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>      if (bytes <= max_bytes && bytes <= max_transfer) {
> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
> +                                 flags & bs->supported_read_flags);

Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
why it isn’t.

Max

>          goto out;
>      }
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 02/13] copy-on-read: add filter append/drop functions
  2020-10-14 10:44   ` Max Reitz
@ 2020-10-14 14:28     ` Andrey Shinkevich
  2020-10-14 16:26       ` Max Reitz
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-14 14:28 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 14.10.2020 13:44, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Provide API for the COR-filter insertion/removal.
>> Also, drop the filter child permissions for an inactive state when the
>> filter node is being removed.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/copy-on-read.h | 35 +++++++++++++++++++++
>>   2 files changed, 123 insertions(+)
>>   create mode 100644 block/copy-on-read.h
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index cb03e0f..bcccf0f 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
> 
> [...]
> 
>> @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
>>       bdrv_register(&bdrv_copy_on_read);
>>   }
>>   
>> +
>> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
>> +                                         QDict *node_options,
>> +                                         int flags, Error **errp)
> 
> I had hoped you could make this a generic block layer function. :(
> 
> (Because it really is rather generic)
> 
> *shrug*

Actually, I did (and still can do) that for the 'append node' function 
only but not for the 'drop node' one so far...

diff --git a/block.c b/block.c
index 11ab55f..f41e876 100644
--- a/block.c
+++ b/block.c
@@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs)
      g_free(bs);
  }

+BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict 
*node_options,
+                                   int flags, Error **errp)
+{
+    BlockDriverState *new_node_bs;
+    Error *local_err = NULL;
+
+    new_node_bs =  bdrv_open(NULL, NULL, node_options, flags, errp);
+    if (new_node_bs == NULL) {
+        error_prepend(errp, "Could not create node: ");
+        return NULL;
+    }
+
+    bdrv_drained_begin(bs);
+    bdrv_replace_node(bs, new_node_bs, &local_err);
+    bdrv_drained_end(bs);
+
+    if (local_err) {
+        bdrv_unref(new_node_bs);
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    return new_node_bs;
+}
+
+void bdrv_remove_node(BlockDriverState *bs)
+{
+    BdrvChild *child;
+    BlockDriverState *inferior_bs;
+
+    child = bdrv_filter_or_cow_child(bs);
+    if (!child) {
+        return;
+    }
+    inferior_bs = child->bs;
+
+    /* Retain the BDS until we complete the graph change. */
+    bdrv_ref(inferior_bs);
+    /* Hold a guest back from writing while permissions are being reset. */
+    bdrv_drained_begin(inferior_bs);
+    /* Refresh permissions before the graph change. */
+    bdrv_child_refresh_perms(bs, child, &error_abort);
+    bdrv_replace_node(bs, inferior_bs, &error_abort);
+
+    bdrv_drained_end(inferior_bs);
+    bdrv_unref(inferior_bs);
+    bdrv_unref(bs);
+}

So, it is an intermediate solution in this patch of the series. I am 
going to make both functions generic once Vladimir overhauls the QEMU 
permission update system. Otherwise, the COR-filter node cannot be 
removed from the backing chain gracefully.

Thank you for your r-b. If the next version comes, I can move the 
'append node' function only to the generic layer.

Andrey

> 
> Reviewed-by: Max Reitz <mreitz@redhat.com>
> 
>> +{
>> +    BlockDriverState *cor_filter_bs;
>> +    Error *local_err = NULL;
>> +
>> +    cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp);
>> +    if (cor_filter_bs == NULL) {
>> +        error_prepend(errp, "Could not create COR-filter node: ");
>> +        return NULL;
>> +    }
>> +
>> +    if (!qdict_get_try_str(node_options, "node-name")) {
>> +        cor_filter_bs->implicit = true;
>> +    }
>> +
>> +    bdrv_drained_begin(bs);
>> +    bdrv_replace_node(bs, cor_filter_bs, &local_err);
>> +    bdrv_drained_end(bs);
>> +
>> +    if (local_err) {
>> +        bdrv_unref(cor_filter_bs);
>> +        error_propagate(errp, local_err);
>> +        return NULL;
>> +    }
>> +
>> +    return cor_filter_bs;
>> +}
> 


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-14 11:57     ` Max Reitz
@ 2020-10-14 14:56       ` Vladimir Sementsov-Ogievskiy
  2020-10-14 16:27         ` Max Reitz
  0 siblings, 1 reply; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-14 14:56 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

14.10.2020 14:57, Max Reitz wrote:
> On 14.10.20 13:09, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> We are going to use the COR-filter for a block-stream job.
>>> To limit COR operations by the base node in the backing chain during
>>> stream job, pass the name of overlay base node to the copy-on-read
>>> driver as base node itself may change due to possible concurrent jobs.
>>> The rest of the functionality will be implemented in the patch that
>>> follows.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 14 ++++++++++++++
>>>   1 file changed, 14 insertions(+)
>>
>> Is there a reason why you didn’t add this option to QAPI (as part of a
>> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index bcccf0f..c578b1b 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -24,19 +24,24 @@
>>>   #include "block/block_int.h"
>>>   #include "qemu/module.h"
>>>   #include "qapi/error.h"
>>> +#include "qapi/qmp/qerror.h"
>>>   #include "qapi/qmp/qdict.h"
>>>   #include "block/copy-on-read.h"
>>>   
>>>   
>>>   typedef struct BDRVStateCOR {
>>>       bool active;
>>> +    BlockDriverState *base_overlay;
>>>   } BDRVStateCOR;
>>>   
>>>   
>>>   static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>>                       Error **errp)
>>>   {
>>> +    BlockDriverState *base_overlay = NULL;
>>>       BDRVStateCOR *state = bs->opaque;
>>> +    /* We need the base overlay node rather than the base itself */
>>> +    const char *base_overlay_node = qdict_get_try_str(options, "base");
>>
>> Shouldn’t it be called base-overlay or above-base then?
> 
> While reviewing patch 5, I realized this sould indeed be base-overlay
> (i.e. the next allocation-bearing layer above the base, not just a
> filter on top of base), but that should be noted somewhere, of course.
> Best would be the QAPI documentation for this option. O:)
> 

What about naming it just "bottom" or "bottom-node"? If we don't have base, it's strange to have something "relative to base". And we can document, that "bottom" must be a non-filter node in a backing chain of "top".


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header
  2020-10-12 17:43 ` [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header Andrey Shinkevich via
@ 2020-10-14 15:02   ` Max Reitz
  2020-10-14 15:40     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 15:02 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 1944 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Avoid writing a filter JSON-name to QCOW2 image when the backing file
> is changed after the block stream job.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/stream.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index e0540ee..51462bd 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
>      BlockDriverState *bs = blk_bs(bjob->blk);
>      BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>      BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
> +    BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
>      Error *local_err = NULL;
>      int ret = 0;
>  
> @@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
>  
>      if (bdrv_cow_child(unfiltered_bs)) {
>          const char *base_id = NULL, *base_fmt = NULL;
> -        if (base) {
> -            base_id = s->backing_file_str;
> -            if (base->drv) {
> -                base_fmt = base->drv->format_name;
> +        if (base_unfiltered) {
> +            base_id = base_unfiltered->filename;

I think you have to leave this querying s->backing_file_str, and instead
change how qmp_block_stream() gets @base_name.  block-stream has a
backing-file parameter that can override the string that should be used
here.

(Or perhaps you can let qmp_block_stream() just set it to NULL if no
backing-file parameter is passed and then fall back to
base_unfiltered->filename only here.  Probably better, actually, in case
base_unfiltered is changed during the job run.)

Max

> +            if (base_unfiltered->drv) {
> +                base_fmt = base_unfiltered->drv->format_name;
>              }
>          }
>          bdrv_set_backing_hd(unfiltered_bs, base, &local_err);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 11/13] stream: mark backing-file argument as deprecated
  2020-10-12 17:43 ` [PATCH v11 11/13] stream: mark backing-file argument as deprecated Andrey Shinkevich via
@ 2020-10-14 15:03   ` Max Reitz
  2020-10-14 15:43     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 15:03 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 1068 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Whereas the block-stream job starts using a backing file name of the
> base node overlay after the block-stream job completes, mark the QMP
> 'backing-file' argument as deprecated.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  docs/system/deprecated.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> index 8b3ab5b..7491fcf 100644
> --- a/docs/system/deprecated.rst
> +++ b/docs/system/deprecated.rst
> @@ -285,6 +285,12 @@ details.
>  The ``query-events`` command has been superseded by the more powerful
>  and accurate ``query-qmp-schema`` command.
>  
> +``block-stream`` argument ``backing-file`` (since 5.2)
> +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
> +
> +The argument ``backing-file`` is deprecated. QEMU uses a backing file
> +name of the base node overlay after the block-stream job completes.
> +

Hm, why?  I don’t see the problem with it.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag
  2020-10-14 12:22   ` Max Reitz
@ 2020-10-14 15:04     ` Vladimir Sementsov-Ogievskiy
  2020-10-14 19:57     ` Andrey Shinkevich
  1 sibling, 0 replies; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-14 15:04 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

14.10.2020 15:22, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
>> use it alone and pass it to the COR-filter driver for further
>> processing.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   include/block/block.h | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 981ab5b..2b7efd1 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -71,9 +71,10 @@ typedef enum {
>>       BDRV_REQ_NO_FALLBACK        = 0x100,
>>   
>>       /*
>> -     * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
>> -     * on read request and means that caller doesn't really need data to be
>> -     * written to qiov parameter which may be NULL.
>> +     * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
>> +     * flag or when the COR-filter applied to read operations and means that
> 
> There’s some word missing here, but I’m not sure what it is...  At least
> an “is” before “applied”.  Perhaps something like ”or when a COR filter
> is involved (in read operations)” would be better.
> 
>> +     * caller doesn't really need data to be written to qiov parameter which
> 
> And this “written to” confused me for a second, because we’re reading
> into qiov.  Technically, that means writing into the buffer, but, you know.
> 
> Could we rewrite the whole thing, perhaps?  Something like
> 
> “BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
> (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a
> COR filter), in which case it signals that the COR operation need not
> read the data into memory (qiov), but only ensure it is copied to the
> top layer (i.e., that COR is done).”
> 

Sounds good

> 
>> +     * may be NULL.
>>        */
>>       BDRV_REQ_PREFETCH  = 0x200,
>>       /* Mask of valid flags */
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 12/13] stream: remove unused backing-file name parameter
  2020-10-12 17:43 ` [PATCH v11 12/13] stream: remove unused backing-file name parameter Andrey Shinkevich via
@ 2020-10-14 15:05   ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 15:05 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 672 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> The 'backing-file' argument is not used by the block-stream job. It
> designates a backing file name to set in QCOW2 image header after the
> block-stream job finished. A backing file name of the node above base
> is used instead.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/stream.c            |  6 +-----
>  blockdev.c                | 21 ++++++---------------
>  include/block/block_int.h |  2 +-
>  3 files changed, 8 insertions(+), 21 deletions(-)

I don’t think you can deprecate a parameter and effectively remove
support for it in the same release.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-14 12:51   ` Max Reitz
@ 2020-10-14 15:22     ` Vladimir Sementsov-Ogievskiy
  2020-10-14 16:30       ` Max Reitz
  2020-10-21 20:43       ` Andrey Shinkevich
  2020-10-14 20:49     ` Andrey Shinkevich
  1 sibling, 2 replies; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-14 15:22 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

14.10.2020 15:51, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>> COR-driver to skip unneeded reading. It can be taken into account for
>> the COR-algorithms optimization. That check is being made during the
>> block stream job by the moment.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 13 +++++++++----
>>   block/io.c           |  3 ++-
>>   2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index b136895..278a11a 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>               }
>>           }
>>   
>> -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> -                                  local_flags);
>> -        if (ret < 0) {
>> -            return ret;
>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
> 
> How about dropping the double negation and using a logical && instead of
> the binary &?
> 
>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>> +            /* Skip non-guest reads if no copy needed */
>> +        } else {
> 
> Hm.  I would have just written the negated form
> 
> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
> 
> and put the “skip” comment above that condition.
> 
> (Since local_flags is initialized to flags, it can be written as a
> single comparison, but that’s a matter of taste and I’m not going to
> recommend either over the other:
> 
> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
> BDRV_REQ_PREFETCH)
> 
> )
> 
>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> +                                      local_flags);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>>           }
>>   
>>           offset += n;
>> diff --git a/block/io.c b/block/io.c
>> index 11df188..bff1808 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>>   
>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>> +                                 flags & bs->supported_read_flags);


When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.

Actually, if driver doesn't support the PREFETCH flag we should do nothing.


> 
> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
> why it isn’t.
> 


Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header
  2020-10-14 15:02   ` Max Reitz
@ 2020-10-14 15:40     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-14 15:40 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

14.10.2020 18:02, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Avoid writing a filter JSON-name to QCOW2 image when the backing file
>> is changed after the block stream job.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/stream.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/stream.c b/block/stream.c
>> index e0540ee..51462bd 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
>> @@ -65,6 +65,7 @@ static int stream_prepare(Job *job)
>>       BlockDriverState *bs = blk_bs(bjob->blk);
>>       BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
>>       BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
>> +    BlockDriverState *base_unfiltered = bdrv_skip_filters(base);
>>       Error *local_err = NULL;
>>       int ret = 0;
>>   
>> @@ -73,10 +74,10 @@ static int stream_prepare(Job *job)
>>   
>>       if (bdrv_cow_child(unfiltered_bs)) {
>>           const char *base_id = NULL, *base_fmt = NULL;
>> -        if (base) {
>> -            base_id = s->backing_file_str;
>> -            if (base->drv) {
>> -                base_fmt = base->drv->format_name;
>> +        if (base_unfiltered) {
>> +            base_id = base_unfiltered->filename;
> 
> I think you have to leave this querying s->backing_file_str, and instead
> change how qmp_block_stream() gets @base_name.  block-stream has a
> backing-file parameter that can override the string that should be used
> here.
> 
> (Or perhaps you can let qmp_block_stream() just set it to NULL if no
> backing-file parameter is passed and then fall back to
> base_unfiltered->filename only here.  Probably better, actually, in case
> base_unfiltered is changed during the job run.)
> 

Agree with the way in brackets.

If user set backing-file parameter we should handle it here.

If backing-file is not set, we should use dynamically calculated unfiltered base in stream_prepare().

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 11/13] stream: mark backing-file argument as deprecated
  2020-10-14 15:03   ` Max Reitz
@ 2020-10-14 15:43     ` Vladimir Sementsov-Ogievskiy
  2020-10-15  9:01       ` Andrey Shinkevich
  0 siblings, 1 reply; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-14 15:43 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

14.10.2020 18:03, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Whereas the block-stream job starts using a backing file name of the
>> base node overlay after the block-stream job completes, mark the QMP
>> 'backing-file' argument as deprecated.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   docs/system/deprecated.rst | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
>> index 8b3ab5b..7491fcf 100644
>> --- a/docs/system/deprecated.rst
>> +++ b/docs/system/deprecated.rst
>> @@ -285,6 +285,12 @@ details.
>>   The ``query-events`` command has been superseded by the more powerful
>>   and accurate ``query-qmp-schema`` command.
>>   
>> +``block-stream`` argument ``backing-file`` (since 5.2)
>> +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>> +
>> +The argument ``backing-file`` is deprecated. QEMU uses a backing file
>> +name of the base node overlay after the block-stream job completes.
>> +
> 
> Hm, why?  I don’t see the problem with it.
> 

My wrong idea, sorry. I believed that the argument is unused when I were reviewing v10. But it actually become unused during the series and it is wrong.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-14 11:09   ` Max Reitz
  2020-10-14 11:57     ` Max Reitz
@ 2020-10-14 16:08     ` Andrey Shinkevich
  2020-10-14 16:18       ` Vladimir Sementsov-Ogievskiy
  2020-10-14 16:36       ` Max Reitz
  1 sibling, 2 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-14 16:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 14.10.2020 14:09, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> We are going to use the COR-filter for a block-stream job.
>> To limit COR operations by the base node in the backing chain during
>> stream job, pass the name of overlay base node to the copy-on-read
>> driver as base node itself may change due to possible concurrent jobs.
>> The rest of the functionality will be implemented in the patch that
>> follows.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
> 
> Is there a reason why you didn’t add this option to QAPI (as part of a
> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
> 

I agree that passing a base overlay under the base option looks clumsy. 
We could pass the base node name and find its overlay ourselves here in 
cor_open(). In that case, we can use the existing QAPI.
The reason I used the existing QAPI is to make it easier for a user to 
operate with the traditional options and to keep things simple. So, the 
user shouldn't think what overlay or above-base node to pass.
If we introduce the specific BlockdevOptionsCor, what other options may 
come with?

>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index bcccf0f..c578b1b 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -24,19 +24,24 @@
>>   #include "block/block_int.h"
>>   #include "qemu/module.h"
>>   #include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>>   #include "qapi/qmp/qdict.h"
>>   #include "block/copy-on-read.h"
>>   
>>   
>>   typedef struct BDRVStateCOR {
>>       bool active;
>> +    BlockDriverState *base_overlay;
>>   } BDRVStateCOR;
>>   
>>   
>>   static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>                       Error **errp)
>>   {
>> +    BlockDriverState *base_overlay = NULL;
>>       BDRVStateCOR *state = bs->opaque;
>> +    /* We need the base overlay node rather than the base itself */
>> +    const char *base_overlay_node = qdict_get_try_str(options, "base");
> 
> Shouldn’t it be called base-overlay or above-base then?
> 

The base_overlay identifier is used below as the pointer to BS. The 
base_overlay_node stands for the name of the node. I used that 
identifier to differ between the types. And the above_base has another 
meaning per block/stream.c - it can be a temporary filter with a JSON-name.

>>   
>>       bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>>                                  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
>> @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>           ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>>               bs->file->bs->supported_zero_flags);
>>   
>> +    if (base_overlay_node) {
>> +        qdict_del(options, "base");
>> +        base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
> 
> I think this is a use-after-free.  The storage @base_overlay_node points
> to belongs to a QString, which is referenced only by @options; so
> deleting that element of @options should free that string.
> 
> Max
> 

I will swap those two function calls (bdrv_lookup_bs(); qdict_del();).
Thank you.

Andrey

>> +        if (!base_overlay) {
>> +            error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
>> +            return -EINVAL;
>> +        }
>> +    }
>>       state->active = true;
>> +    state->base_overlay = base_overlay;
>>   
>>       /*
>>        * We don't need to call bdrv_child_refresh_perms() now as the permissions
>>
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-14 16:08     ` Andrey Shinkevich
@ 2020-10-14 16:18       ` Vladimir Sementsov-Ogievskiy
  2020-10-14 16:36       ` Max Reitz
  1 sibling, 0 replies; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-14 16:18 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

14.10.2020 19:08, Andrey Shinkevich wrote:
> On 14.10.2020 14:09, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> We are going to use the COR-filter for a block-stream job.
>>> To limit COR operations by the base node in the backing chain during
>>> stream job, pass the name of overlay base node to the copy-on-read
>>> driver as base node itself may change due to possible concurrent jobs.
>>> The rest of the functionality will be implemented in the patch that
>>> follows.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 14 ++++++++++++++
>>>   1 file changed, 14 insertions(+)
>>
>> Is there a reason why you didn’t add this option to QAPI (as part of a
>> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
>>
> 
> I agree that passing a base overlay under the base option looks clumsy. We could pass the base node name and find its overlay ourselves here in cor_open(). In that case, we can use the existing QAPI.

Actually, there is no existing QAPI: if you don't modify qapi/*.json, user is not able to pass the option through QAPI. It's still possible to pass the option through command-line, or when create the filter internally (like we are going to do in block-stream), but not through QAPI. So, it's better to make a new QAPI parameter, to make the new option available for QMP interface.

> The reason I used the existing QAPI is to make it easier for a user to operate with the traditional options and to keep things simple. So, the user shouldn't think what overlay or above-base node to pass.
> If we introduce the specific BlockdevOptionsCor, what other options may come with?
> 
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index bcccf0f..c578b1b 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -24,19 +24,24 @@
>>>   #include "block/block_int.h"
>>>   #include "qemu/module.h"
>>>   #include "qapi/error.h"
>>> +#include "qapi/qmp/qerror.h"
>>>   #include "qapi/qmp/qdict.h"
>>>   #include "block/copy-on-read.h"
>>>   typedef struct BDRVStateCOR {
>>>       bool active;
>>> +    BlockDriverState *base_overlay;
>>>   } BDRVStateCOR;
>>>   static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>>                       Error **errp)
>>>   {
>>> +    BlockDriverState *base_overlay = NULL;
>>>       BDRVStateCOR *state = bs->opaque;
>>> +    /* We need the base overlay node rather than the base itself */
>>> +    const char *base_overlay_node = qdict_get_try_str(options, "base");
>>
>> Shouldn’t it be called base-overlay or above-base then?
>>
> 
> The base_overlay identifier is used below as the pointer to BS. The base_overlay_node stands for the name of the node. I used that identifier to differ between the types. And the above_base has another meaning per block/stream.c - it can be a temporary filter with a JSON-name.
> 
>>>       bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>>>                                  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
>>> @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>>           ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>>>               bs->file->bs->supported_zero_flags);
>>> +    if (base_overlay_node) {
>>> +        qdict_del(options, "base");
>>> +        base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
>>
>> I think this is a use-after-free.  The storage @base_overlay_node points
>> to belongs to a QString, which is referenced only by @options; so
>> deleting that element of @options should free that string.
>>
>> Max
>>
> 
> I will swap those two function calls (bdrv_lookup_bs(); qdict_del();).
> Thank you.
> 
> Andrey
> 
>>> +        if (!base_overlay) {
>>> +            error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
>>> +            return -EINVAL;
>>> +        }
>>> +    }
>>>       state->active = true;
>>> +    state->base_overlay = base_overlay;
>>>       /*
>>>        * We don't need to call bdrv_child_refresh_perms() now as the permissions
>>>
>>
>>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/13] block: apply COR-filter to block-stream jobs
  2020-10-12 17:43 ` [PATCH v11 13/13] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
@ 2020-10-14 16:24   ` Max Reitz
  2020-10-15 17:16     ` Andrey Shinkevich
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 16:24 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 8094 bytes --]

On 12.10.20 19:43, Andrey Shinkevich wrote:
> This patch completes the series with the COR-filter insertion for
> block-stream operations. Adding the filter makes it possible for copied
> regions to be discarded in backing files during the block-stream job,
> what will reduce the disk overuse.
> The COR-filter insertion incurs changes in the iotests case
> 245:test_block_stream_4 that reopens the backing chain during a
> block-stream job. There are changes in the iotests #030 as well.
> The iotests case 030:test_stream_parallel was deleted due to multiple
> conflicts between the concurrent job operations over the same backing
> chain. The base backing node for one job is the top node for another
> job. It may change due to the filter node inserted into the backing
> chain while both jobs are running. Another issue is that the parts of
> the backing chain are being frozen by the running job and may not be
> changed by the concurrent job when needed. The concept of the parallel
> jobs with common nodes is considered vital no more.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/stream.c             | 93 +++++++++++++++++++++++++++++-----------------
>  tests/qemu-iotests/030     | 51 +++----------------------
>  tests/qemu-iotests/030.out |  4 +-
>  tests/qemu-iotests/141.out |  2 +-
>  tests/qemu-iotests/245     | 19 +++++++---
>  5 files changed, 81 insertions(+), 88 deletions(-)

Looks like stream_run() could be a bit streamlined now (the allocation
checking should be unnecessary, unconditionally calling
stream_populate() should be sufficient), but not necessary now.

> diff --git a/block/stream.c b/block/stream.c
> index d3e1812..93564db 100644
> --- a/block/stream.c
> +++ b/block/stream.c

[...]

> @@ -94,13 +94,14 @@ static void stream_clean(Job *job)
>  {
>      StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
>      BlockJob *bjob = &s->common;
> -    BlockDriverState *bs = blk_bs(bjob->blk);
> +
> +    bdrv_cor_filter_drop(s->cor_filter_bs);
>  
>      /* Reopen the image back in read-only mode if necessary */
>      if (s->bs_read_only) {
>          /* Give up write permissions before making it read-only */
>          blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);

Perhaps it would be good to do so even before the filter is dropped.  I
don’t know whether moving bjob->blk from cor_filter_bs to target_bs
might cause problems otherwise.

> -        bdrv_reopen_set_read_only(bs, true, NULL);
> +        bdrv_reopen_set_read_only(s->target_bs, true, NULL);
>      }
>  }

[...]

> @@ -262,17 +249,48 @@ void stream_start(const char *job_id, BlockDriverState *bs,
>          }
>      }
>  
> -    /* Prevent concurrent jobs trying to modify the graph structure here, we
> -     * already have our own plans. Also don't allow resize as the image size is
> -     * queried only at the job start and then cached. */
> -    s = block_job_create(job_id, &stream_job_driver, NULL, bs,
> -                         basic_flags | BLK_PERM_GRAPH_MOD,
> -                         basic_flags | BLK_PERM_WRITE,
> +    QDict *opts = qdict_new();

Declaration should be done at the start of the block.

> +
> +    qdict_put_str(opts, "driver", "copy-on-read");
> +    qdict_put_str(opts, "file", bdrv_get_node_name(bs));
> +    if (base_overlay) {

@base_overlay is always non-NULL, this condition should check @base, I
think.

> +        /* Pass the base_overlay rather than base */
> +        qdict_put_str(opts, "base", base_overlay->node_name);
> +    }
> +    if (filter_node_name) {
> +        qdict_put_str(opts, "node-name", filter_node_name);
> +    }
> +
> +    cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp);
> +    if (cor_filter_bs == NULL) {
> +        goto fail;
> +    }
> +
> +    if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {

Is there a reason why we can’t combine this with the
bdrv_free_backing_chain() from bs down to above_base?  I mean, the
effect should be the same, just asking.

> +        bdrv_cor_filter_drop(cor_filter_bs);
> +        cor_filter_bs = NULL;
> +        goto fail;
> +    }
> +
> +    s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs,
> +                         BLK_PERM_CONSISTENT_READ,
> +                         basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,

Not that I’m an expert on the GRAPH_MOD permission, but why is this
shared here but not below?  Shouldn’t it be the same in both cases?
(Same for taking it as a permission.)

>                           speed, creation_flags, NULL, NULL, errp);
>      if (!s) {
>          goto fail;
>      }
>  
> +    /*
> +     * Prevent concurrent jobs trying to modify the graph structure here, we
> +     * already have our own plans. Also don't allow resize as the image size is
> +     * queried only at the job start and then cached.
> +     */
> +    if (block_job_add_bdrv(&s->common, "active node", bs,
> +                           basic_flags | BLK_PERM_GRAPH_MOD,
> +                           basic_flags | BLK_PERM_WRITE, &error_abort)) {
> +        goto fail;
> +    }
> +
>      /* Block all intermediate nodes between bs and base, because they will
>       * disappear from the chain after this operation. The streaming job reads
>       * every block only once, assuming that it doesn't change, so forbid writes

[...]

> diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
> index e60c832..940e85a 100755
> --- a/tests/qemu-iotests/245
> +++ b/tests/qemu-iotests/245
> @@ -899,17 +899,26 @@ class TestBlockdevReopen(iotests.QMPTestCase):
>          # make hd1 read-only and block-stream requires it to be read-write
>          # (Which error message appears depends on whether the stream job is
>          # already done with copying at this point.)

Hm.  Let’s look at the set of messages below... [1]

> -        self.reopen(opts, {},
> +        # As the COR-filter node is inserted into the backing chain with the
> +        # 'block-stream' operation, we move the options to their proper nodes.
> +        opts = hd_opts(1)

Oh, so this patch changes it so that only the subtree below hd1 is
reopened, and we don’t have to deal with the filter options.  Got it.
(I think.)

> +        opts['backing'] = hd_opts(2)
> +        opts['backing']['backing'] = None
> +        self.reopen(opts, {'read-only': True},
>              ["Can't set node 'hd1' to r/o with copy-on-read enabled",

[1]

This isn’t done anymore as of this patch.  So I don’t think this error
message can still appear.  Will some other message appear in its stead,
or is it always going to be the second one?

>               "Cannot make block node read-only, there is a writer on it"])
>  
>          # We can't remove hd2 while the stream job is ongoing
> -        opts['backing']['backing'] = None
> -        self.reopen(opts, {'backing.read-only': False}, "Cannot change 'backing' link from 'hd1' to 'hd2'")
> +        opts['backing'] = None
> +        self.reopen(opts, {'read-only': False},
> +                    "Cannot change 'backing' link from 'hd1' to 'hd2'")
>  
> -        # We can detach hd1 from hd0 because it doesn't affect the stream job
> +        # We can't detach hd1 from hd0 because there is the COR-filter implicit
> +        # node in between.
> +        opts = hd_opts(0)
>          opts['backing'] = None
> -        self.reopen(opts)
> +        self.reopen(opts, {},
> +                    "Cannot change backing link if 'hd0' has an implicit backing file")

Does “has an implicit backing file” mean that hd0 has an implicit node
(the COR filter) as its backing file?  And then reopening isn’t allowed
because the user supposedly doesn’t know about that implicit node?  If
so, makes sense.

Max

>  
>          self.vm.run_job('stream0', auto_finalize = False, auto_dismiss = True)
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 02/13] copy-on-read: add filter append/drop functions
  2020-10-14 14:28     ` Andrey Shinkevich
@ 2020-10-14 16:26       ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 16:26 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 4159 bytes --]

On 14.10.20 16:28, Andrey Shinkevich wrote:
> On 14.10.2020 13:44, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Provide API for the COR-filter insertion/removal.
>>> Also, drop the filter child permissions for an inactive state when the
>>> filter node is being removed.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   block/copy-on-read.h | 35 +++++++++++++++++++++
>>>   2 files changed, 123 insertions(+)
>>>   create mode 100644 block/copy-on-read.h
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index cb03e0f..bcccf0f 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>
>> [...]
>>
>>> @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
>>>       bdrv_register(&bdrv_copy_on_read);
>>>   }
>>>   +
>>> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
>>> +                                         QDict *node_options,
>>> +                                         int flags, Error **errp)
>>
>> I had hoped you could make this a generic block layer function. :(
>>
>> (Because it really is rather generic)
>>
>> *shrug*
> 
> Actually, I did (and still can do) that for the 'append node' function
> only but not for the 'drop node' one so far...

Ah, yeah.

> diff --git a/block.c b/block.c
> index 11ab55f..f41e876 100644
> --- a/block.c
> +++ b/block.c
> @@ -4669,6 +4669,55 @@ static void bdrv_delete(BlockDriverState *bs)
>      g_free(bs);
>  }
> 
> +BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict
> *node_options,
> +                                   int flags, Error **errp)
> +{
> +    BlockDriverState *new_node_bs;
> +    Error *local_err = NULL;
> +
> +    new_node_bs =  bdrv_open(NULL, NULL, node_options, flags, errp);
> +    if (new_node_bs == NULL) {
> +        error_prepend(errp, "Could not create node: ");
> +        return NULL;
> +    }
> +
> +    bdrv_drained_begin(bs);
> +    bdrv_replace_node(bs, new_node_bs, &local_err);
> +    bdrv_drained_end(bs);
> +
> +    if (local_err) {
> +        bdrv_unref(new_node_bs);
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +
> +    return new_node_bs;
> +}
> +
> +void bdrv_remove_node(BlockDriverState *bs)
> +{
> +    BdrvChild *child;
> +    BlockDriverState *inferior_bs;
> +
> +    child = bdrv_filter_or_cow_child(bs);
> +    if (!child) {
> +        return;
> +    }
> +    inferior_bs = child->bs;
> +
> +    /* Retain the BDS until we complete the graph change. */
> +    bdrv_ref(inferior_bs);
> +    /* Hold a guest back from writing while permissions are being
> reset. */
> +    bdrv_drained_begin(inferior_bs);
> +    /* Refresh permissions before the graph change. */
> +    bdrv_child_refresh_perms(bs, child, &error_abort);
> +    bdrv_replace_node(bs, inferior_bs, &error_abort);
> +
> +    bdrv_drained_end(inferior_bs);
> +    bdrv_unref(inferior_bs);
> +    bdrv_unref(bs);
> +}
> 
> So, it is an intermediate solution in this patch of the series. I am
> going to make both functions generic once Vladimir overhauls the QEMU
> permission update system. Otherwise, the COR-filter node cannot be
> removed from the backing chain gracefully.

True.

> Thank you for your r-b. If the next version comes, I can move the
> 'append node' function only to the generic layer.

Thanks!  Even if you decide against it, I’m happy as long as there are
plans to perhaps make it generic at some point.

(I’m just afraid this function might be useful in some other case, too,
and we forget that it exists here already, and then end up
reimplementing it.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-14 14:56       ` Vladimir Sementsov-Ogievskiy
@ 2020-10-14 16:27         ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 16:27 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, libvir-list, qemu-devel, armbru, stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 2702 bytes --]

On 14.10.20 16:56, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 14:57, Max Reitz wrote:
>> On 14.10.20 13:09, Max Reitz wrote:
>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>> We are going to use the COR-filter for a block-stream job.
>>>> To limit COR operations by the base node in the backing chain during
>>>> stream job, pass the name of overlay base node to the copy-on-read
>>>> driver as base node itself may change due to possible concurrent jobs.
>>>> The rest of the functionality will be implemented in the patch that
>>>> follows.
>>>>
>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>> ---
>>>>   block/copy-on-read.c | 14 ++++++++++++++
>>>>   1 file changed, 14 insertions(+)
>>>
>>> Is there a reason why you didn’t add this option to QAPI (as part of a
>>> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it
>>> there.
>>>
>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>> index bcccf0f..c578b1b 100644
>>>> --- a/block/copy-on-read.c
>>>> +++ b/block/copy-on-read.c
>>>> @@ -24,19 +24,24 @@
>>>>   #include "block/block_int.h"
>>>>   #include "qemu/module.h"
>>>>   #include "qapi/error.h"
>>>> +#include "qapi/qmp/qerror.h"
>>>>   #include "qapi/qmp/qdict.h"
>>>>   #include "block/copy-on-read.h"
>>>>       typedef struct BDRVStateCOR {
>>>>       bool active;
>>>> +    BlockDriverState *base_overlay;
>>>>   } BDRVStateCOR;
>>>>       static int cor_open(BlockDriverState *bs, QDict *options, int
>>>> flags,
>>>>                       Error **errp)
>>>>   {
>>>> +    BlockDriverState *base_overlay = NULL;
>>>>       BDRVStateCOR *state = bs->opaque;
>>>> +    /* We need the base overlay node rather than the base itself */
>>>> +    const char *base_overlay_node = qdict_get_try_str(options,
>>>> "base");
>>>
>>> Shouldn’t it be called base-overlay or above-base then?
>>
>> While reviewing patch 5, I realized this sould indeed be base-overlay

(Just realized that sounds different from how I meant it.  I meant that
 “above-base” would be wrong, so from the two, if any, it should be
“base-overlay”.)

>> (i.e. the next allocation-bearing layer above the base, not just a
>> filter on top of base), but that should be noted somewhere, of course.
>> Best would be the QAPI documentation for this option. O:)
>>
> 
> What about naming it just "bottom" or "bottom-node"? If we don't have
> base, it's strange to have something "relative to base". And we can
> document, that "bottom" must be a non-filter node in a backing chain of
> "top".

Works for me, too.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-14 15:22     ` Vladimir Sementsov-Ogievskiy
@ 2020-10-14 16:30       ` Max Reitz
  2020-10-14 16:39         ` Vladimir Sementsov-Ogievskiy
  2020-10-21 20:43       ` Andrey Shinkevich
  1 sibling, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-14 16:30 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, libvir-list, qemu-devel, armbru, stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 4111 bytes --]

On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 15:51, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>> COR-driver to skip unneeded reading. It can be taken into account for
>>> the COR-algorithms optimization. That check is being made during the
>>> block stream job by the moment.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 13 +++++++++----
>>>   block/io.c           |  3 ++-
>>>   2 files changed, 11 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index b136895..278a11a 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -148,10 +148,15 @@ static int coroutine_fn
>>> cor_co_preadv_part(BlockDriverState *bs,
>>>               }
>>>           }
>>>   -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>> qiov_offset,
>>> -                                  local_flags);
>>> -        if (ret < 0) {
>>> -            return ret;
>>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
>>
>> How about dropping the double negation and using a logical && instead of
>> the binary &?
>>
>>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>>> +            /* Skip non-guest reads if no copy needed */
>>> +        } else {
>>
>> Hm.  I would have just written the negated form
>>
>> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
>>
>> and put the “skip” comment above that condition.
>>
>> (Since local_flags is initialized to flags, it can be written as a
>> single comparison, but that’s a matter of taste and I’m not going to
>> recommend either over the other:
>>
>> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
>> BDRV_REQ_PREFETCH)
>>
>> )
>>
>>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>> qiov_offset,
>>> +                                      local_flags);
>>> +            if (ret < 0) {
>>> +                return ret;
>>> +            }
>>>           }
>>>             offset += n;
>>> diff --git a/block/io.c b/block/io.c
>>> index 11df188..bff1808 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn
>>> bdrv_aligned_preadv(BdrvChild *child,
>>>         max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov,
>>> qiov_offset, 0);
>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>>> +                                 flags & bs->supported_read_flags);
> 
> 
> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be)
> NULL. This means, that we can't just drop the flag when call the driver
> that doesn't support it.

True. :/

> Actually, if driver doesn't support the PREFETCH flag we should do nothing.

Hm.  But at least in the case of COR, PREFETCH is not something that can
be optimized to be a no-op (unless the COR is a no-op); it still denotes
a command that must be executed.

So if we can’t pass it to the driver, I don’t think we should do
nothing, but to return an error.  Or maybe we could even assert that it
isn’t set for drivers that don’t support it, because at least right now
such a case would just be a bug.

>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
>> why it isn’t.
>>
> 
> 
> Could it be part of patch 07? I mean introduce new field
> supported_read_flags and handle it in generic code in one patch, prior
> to implementing support for it in COR driver.

Sure.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver
  2020-10-14 16:08     ` Andrey Shinkevich
  2020-10-14 16:18       ` Vladimir Sementsov-Ogievskiy
@ 2020-10-14 16:36       ` Max Reitz
  1 sibling, 0 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-14 16:36 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 4711 bytes --]

On 14.10.20 18:08, Andrey Shinkevich wrote:
> On 14.10.2020 14:09, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> We are going to use the COR-filter for a block-stream job.
>>> To limit COR operations by the base node in the backing chain during
>>> stream job, pass the name of overlay base node to the copy-on-read
>>> driver as base node itself may change due to possible concurrent jobs.
>>> The rest of the functionality will be implemented in the patch that
>>> follows.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 14 ++++++++++++++
>>>   1 file changed, 14 insertions(+)
>>
>> Is there a reason why you didn’t add this option to QAPI (as part of a
>> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
>>
> 
> I agree that passing a base overlay under the base option looks clumsy.
> We could pass the base node name and find its overlay ourselves here in
> cor_open(). In that case, we can use the existing QAPI.

Sounds more complicated than to just split off BlockdevOptionsCor (from
BlockdevOptionsGenericFormat, like e.g. BlockdevOptionsRaw does it).

> The reason I used the existing QAPI is to make it easier for a user to
> operate with the traditional options and to keep things simple. So, the
> user shouldn't think what overlay or above-base node to pass.

Well, they don’t have to pass anything if they don’t need a bottom node.
 So unless they want to use a bottom/base-overlay node, it won’t become
more complicated.

> If we introduce the specific BlockdevOptionsCor, what other options may
> come with?

Nothing yet, I think.

>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index bcccf0f..c578b1b 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -24,19 +24,24 @@
>>>   #include "block/block_int.h"
>>>   #include "qemu/module.h"
>>>   #include "qapi/error.h"
>>> +#include "qapi/qmp/qerror.h"
>>>   #include "qapi/qmp/qdict.h"
>>>   #include "block/copy-on-read.h"
>>>       typedef struct BDRVStateCOR {
>>>       bool active;
>>> +    BlockDriverState *base_overlay;
>>>   } BDRVStateCOR;
>>>       static int cor_open(BlockDriverState *bs, QDict *options, int
>>> flags,
>>>                       Error **errp)
>>>   {
>>> +    BlockDriverState *base_overlay = NULL;
>>>       BDRVStateCOR *state = bs->opaque;
>>> +    /* We need the base overlay node rather than the base itself */
>>> +    const char *base_overlay_node = qdict_get_try_str(options, "base");
>>
>> Shouldn’t it be called base-overlay or above-base then?
>>
> 
> The base_overlay identifier is used below as the pointer to BS. The
> base_overlay_node stands for the name of the node. I used that
> identifier to differ between the types. And the above_base has another
> meaning per block/stream.c - it can be a temporary filter with a JSON-name.

Yes, agreed; I’m not talking about the variable identifier though.  I’m
talking about the option name, which in this version is just “base”.
(And whenever that option is used, once here, once later in
block/stream.c, there is an explanatory comment why we don’t pass the
base node through it, but the first overlay above the base.  Which makes
me believe perhaps it shouldn’t be called “base”.)

>>>         bs->file = bdrv_open_child(NULL, options, "file", bs,
>>> &child_of_bds,
>>>                                  BDRV_CHILD_FILTERED |
>>> BDRV_CHILD_PRIMARY,
>>> @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict
>>> *options, int flags,
>>>           ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>>>               bs->file->bs->supported_zero_flags);
>>>   +    if (base_overlay_node) {
>>> +        qdict_del(options, "base");
>>> +        base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);
>>
>> I think this is a use-after-free.  The storage @base_overlay_node points
>> to belongs to a QString, which is referenced only by @options; so
>> deleting that element of @options should free that string.
>>
>> Max
>>
> 
> I will swap those two function calls (bdrv_lookup_bs(); qdict_del();).
> Thank you.

Don’t forget that the error_setg() below also makes use of it:

>>> +        if (!base_overlay) {
>>> +            error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
>>> +            return -EINVAL;
>>> +        }
>>> +    }

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-14 16:30       ` Max Reitz
@ 2020-10-14 16:39         ` Vladimir Sementsov-Ogievskiy
  2020-10-15 15:49           ` Max Reitz
  0 siblings, 1 reply; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-14 16:39 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

14.10.2020 19:30, Max Reitz wrote:
> On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:
>> 14.10.2020 15:51, Max Reitz wrote:
>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>>> COR-driver to skip unneeded reading. It can be taken into account for
>>>> the COR-algorithms optimization. That check is being made during the
>>>> block stream job by the moment.
>>>>
>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>> ---
>>>>    block/copy-on-read.c | 13 +++++++++----
>>>>    block/io.c           |  3 ++-
>>>>    2 files changed, 11 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>> index b136895..278a11a 100644
>>>> --- a/block/copy-on-read.c
>>>> +++ b/block/copy-on-read.c
>>>> @@ -148,10 +148,15 @@ static int coroutine_fn
>>>> cor_co_preadv_part(BlockDriverState *bs,
>>>>                }
>>>>            }
>>>>    -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>>> qiov_offset,
>>>> -                                  local_flags);
>>>> -        if (ret < 0) {
>>>> -            return ret;
>>>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
>>>
>>> How about dropping the double negation and using a logical && instead of
>>> the binary &?
>>>
>>>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>>>> +            /* Skip non-guest reads if no copy needed */
>>>> +        } else {
>>>
>>> Hm.  I would have just written the negated form
>>>
>>> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
>>>
>>> and put the “skip” comment above that condition.
>>>
>>> (Since local_flags is initialized to flags, it can be written as a
>>> single comparison, but that’s a matter of taste and I’m not going to
>>> recommend either over the other:
>>>
>>> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
>>> BDRV_REQ_PREFETCH)
>>>
>>> )
>>>
>>>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>>> qiov_offset,
>>>> +                                      local_flags);
>>>> +            if (ret < 0) {
>>>> +                return ret;
>>>> +            }
>>>>            }
>>>>              offset += n;
>>>> diff --git a/block/io.c b/block/io.c
>>>> index 11df188..bff1808 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn
>>>> bdrv_aligned_preadv(BdrvChild *child,
>>>>          max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>>        if (bytes <= max_bytes && bytes <= max_transfer) {
>>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov,
>>>> qiov_offset, 0);
>>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>>>> +                                 flags & bs->supported_read_flags);
>>
>>
>> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be)
>> NULL. This means, that we can't just drop the flag when call the driver
>> that doesn't support it.
> 
> True. :/
> 
>> Actually, if driver doesn't support the PREFETCH flag we should do nothing.
> 
> Hm.  But at least in the case of COR, PREFETCH is not something that can
> be optimized to be a no-op (unless the COR is a no-op); it still denotes
> a command that must be executed.
> 
> So if we can’t pass it to the driver, I don’t think we should do
> nothing, but to return an error.  Or maybe we could even assert that it
> isn’t set for drivers that don’t support it, because at least right now
> such a case would just be a bug.

Hmm. Reasonable..

So, let me summarize the cases:

1. bdrv_co_preadv(.. , PREFETCH | COR)

   In this case generic layer should handle both flags and pass flags=0 to driver

2. bdrv_co_preadv(.., PREFETCH)

2.1 driver supporst PREFETCH
   
   OK, pass PREFETCH to driver, no problems

2.2 driver doesn't support PREFETCH

   We can just abort() here, as the only source of PREFETCH without COR would be
   stream job driver, which must read from COR filter.

   More generic solution is to allocate temporary buffer (at least if qiov is NULL)
   and call underlying driver .preadv with flags=0 on that temporary buffer. But
   just abort() is simpler and should work for now.

> 
>>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
>>> why it isn’t.
>>>
>>
>>
>> Could it be part of patch 07? I mean introduce new field
>> supported_read_flags and handle it in generic code in one patch, prior
>> to implementing support for it in COR driver.
> 
> Sure.
> 
> Max
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-14 11:59   ` Max Reitz
@ 2020-10-14 17:43     ` Andrey Shinkevich
  0 siblings, 0 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-14 17:43 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 14.10.2020 14:59, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Limit COR operations by the base node in the backing chain when the
>> overlay base node name is given. It will be useful for a block stream
>> job when the COR-filter is applied. The overlay base node is passed as
>> the base itself may change due to concurrent commit jobs on the same
>> backing chain.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index c578b1b..dfbd6ad 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>                                              size_t qiov_offset,
>>                                              int flags)
>>   {
>> -    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
>> -                               flags | BDRV_REQ_COPY_ON_READ);
>> +    int64_t n = 0;
>> +    int64_t size = offset + bytes;
>> +    int local_flags;
>> +    int ret;
>> +    BDRVStateCOR *state = bs->opaque;
>> +
>> +    if (!state->base_overlay) {
>> +        return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
>> +                                   flags | BDRV_REQ_COPY_ON_READ);
>> +    }
>> +
>> +    while (offset < size) {
>> +        local_flags = flags;
>> +
>> +        /* In case of failure, try to copy-on-read anyway */
>> +        ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
>> +        if (!ret) {
> 
> In case of failure, a negative value is going to be returned, we won’t
> go into this conditional block, and local_flags isn’t going to contain
> BDRV_REQ_COPY_ON_READ.
> 
> So the idea of CORing in case of failure sounds sound to me, but it
> doesn’t look like that’s done.
> 

Yes, it's obvious. That was just my fault to miss setting the additional 
condition for "ret < 0". Thank you for noticing that.

Andrey

>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
> 
> I think this should either be bdrv_backing_chain_next() or we must rule
> out the possibility of bs->file->bs being a filter somewhere.  I think
> I’d prefer the former.
> 
>> +                                          state->base_overlay, true, offset,
>> +                                          n, &n);
>> +            if (ret) {
> 
> “ret == 1 || ret < 0” would be more explicit (and in line with the “!ret
> || ret < 0” probably needed above), but correct either way.
> 
> Max
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-14 12:01   ` Max Reitz
@ 2020-10-14 18:57     ` Andrey Shinkevich
  2020-10-15 15:56       ` Max Reitz
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-14 18:57 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 14.10.2020 15:01, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Limit COR operations by the base node in the backing chain when the
>> overlay base node name is given. It will be useful for a block stream
>> job when the COR-filter is applied. The overlay base node is passed as
>> the base itself may change due to concurrent commit jobs on the same
>> backing chain.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index c578b1b..dfbd6ad 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -122,8 +122,43 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>                                              size_t qiov_offset,
>>                                              int flags)
>>   {

[...]

>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>> +                                          state->base_overlay, true, offset,
>> +                                          n, &n);
>> +            if (ret) {
>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>> +            }
>> +        }
> 
> Furthermore, I just noticed – can the is_allocated functions not return
> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>   (I’m not sure.)
> 
> Max
> 

The check for EOF is managed earlier in the stream_run() for a 
block-stream job. For other cases of using the COR-filter, the check for 
EOF can be added to the cor_co_preadv_part().
I would be more than happy if we can escape the duplicated checking for 
is_allocated in the block-stream. But how the stream_run() can stop 
calling the blk_co_preadv() when EOF is reached if is_allocated removed 
from it? May the cor_co_preadv_part() return EOF (or other error code) 
to be handled by a caller if (ret == 0 && n == 0 && (flags & 
BDRV_REQ_PREFETCH)?

Andrey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag
  2020-10-14 12:22   ` Max Reitz
  2020-10-14 15:04     ` Vladimir Sementsov-Ogievskiy
@ 2020-10-14 19:57     ` Andrey Shinkevich
  1 sibling, 0 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-14 19:57 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 14.10.2020 15:22, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> Modify the comment for the flag BDRV_REQ_PREFETCH as we are going to
>> use it alone and pass it to the COR-filter driver for further
>> processing.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   include/block/block.h | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 981ab5b..2b7efd1 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -71,9 +71,10 @@ typedef enum {
>>       BDRV_REQ_NO_FALLBACK        = 0x100,
>>   
>>       /*
>> -     * BDRV_REQ_PREFETCH may be used only together with BDRV_REQ_COPY_ON_READ
>> -     * on read request and means that caller doesn't really need data to be
>> -     * written to qiov parameter which may be NULL.
>> +     * BDRV_REQ_PREFETCH may be used together with the BDRV_REQ_COPY_ON_READ
>> +     * flag or when the COR-filter applied to read operations and means that
> 
> There’s some word missing here, but I’m not sure what it is...  At least
> an “is” before “applied”.  Perhaps something like ”or when a COR filter
> is involved (in read operations)” would be better.
> 
>> +     * caller doesn't really need data to be written to qiov parameter which
> 
> And this “written to” confused me for a second, because we’re reading
> into qiov.  Technically, that means writing into the buffer, but, you know.
> 
> Could we rewrite the whole thing, perhaps?  Something like
> 
> “BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
> (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when there is a
> COR filter), in which case it signals that the COR operation need not
> read the data into memory (qiov), but only ensure it is copied to the
> top layer (i.e., that COR is done).”
> 
> I don’t know.
> 
> Max
> 

I would modify a little:

“BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
(i.e., together with the BDRV_REQ_COPY_ON_READ flag or when a COR
filter is involved), in which case it signals that the COR operation
need not read the data into memory (qiov) but only ensure they are
copied to the top layer (i.e., that COR operation is done).”


>> +     * may be NULL.
>>        */
>>       BDRV_REQ_PREFETCH  = 0x200,
>>       /* Mask of valid flags */
>>
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-14 12:51   ` Max Reitz
  2020-10-14 15:22     ` Vladimir Sementsov-Ogievskiy
@ 2020-10-14 20:49     ` Andrey Shinkevich
  1 sibling, 0 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-14 20:49 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 14.10.2020 15:51, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>> COR-driver to skip unneeded reading. It can be taken into account for
>> the COR-algorithms optimization. That check is being made during the
>> block stream job by the moment.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 13 +++++++++----
>>   block/io.c           |  3 ++-
>>   2 files changed, 11 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index b136895..278a11a 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -148,10 +148,15 @@ static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
>>               }
>>           }
>>   
>> -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> -                                  local_flags);
>> -        if (ret < 0) {
>> -            return ret;
>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
> 
> How about dropping the double negation and using a logical && instead of
> the binary &?
> 

Yes, that's correct.

>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>> +            /* Skip non-guest reads if no copy needed */
>> +        } else {
> 
> Hm.  I would have just written the negated form
> 
> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
> 
> and put the “skip” comment above that condition.
> 
> (Since local_flags is initialized to flags, it can be written as a
> single comparison, but that’s a matter of taste and I’m not going to
> recommend either over the other:
> 

I played with the flags to make the idea obvious for the eye of a 
beholder: "we neither read nor write".
Comparing the BDRV_REQ_PREFETCH against the 'flags' means that the flag 
comes from outside of the function.
And the empty section means we do nothing in that case.
Eventually, I will pick up the brief expression below.
Thanks,

Andrey

> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
> BDRV_REQ_PREFETCH)
> 
> )
> 
>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
>> +                                      local_flags);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>>           }
>>   
>>           offset += n;
>> diff --git a/block/io.c b/block/io.c
>> index 11df188..bff1808 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>>   
>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>> +                                 flags & bs->supported_read_flags);
> 
> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
> why it isn’t.
> 
> Max
> 
>>           goto out;
>>       }
>>   
>>
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 11/13] stream: mark backing-file argument as deprecated
  2020-10-14 15:43     ` Vladimir Sementsov-Ogievskiy
@ 2020-10-15  9:01       ` Andrey Shinkevich
  0 siblings, 0 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-15  9:01 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

On 14.10.2020 18:43, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 18:03, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Whereas the block-stream job starts using a backing file name of the
>>> base node overlay after the block-stream job completes, mark the QMP
>>> 'backing-file' argument as deprecated.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   docs/system/deprecated.rst | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
>>> index 8b3ab5b..7491fcf 100644
>>> --- a/docs/system/deprecated.rst
>>> +++ b/docs/system/deprecated.rst
>>> @@ -285,6 +285,12 @@ details.
>>>   The ``query-events`` command has been superseded by the more powerful
>>>   and accurate ``query-qmp-schema`` command.
>>> +``block-stream`` argument ``backing-file`` (since 5.2)
>>> +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
>>> +
>>> +The argument ``backing-file`` is deprecated. QEMU uses a backing file
>>> +name of the base node overlay after the block-stream job completes.
>>> +
>>
>> Hm, why?  I don’t see the problem with it.
>>
> 
> My wrong idea, sorry. I believed that the argument is unused when I were 
> reviewing v10. But it actually become unused during the series and it is 
> wrong.
> 

I missed searching for calls to the qmp_block_stream() in the QEMU 
dinamically generated code. Will roll back.

Andrey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-14 16:39         ` Vladimir Sementsov-Ogievskiy
@ 2020-10-15 15:49           ` Max Reitz
  0 siblings, 0 replies; 56+ messages in thread
From: Max Reitz @ 2020-10-15 15:49 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, libvir-list, qemu-devel, armbru, stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 4890 bytes --]

On 14.10.20 18:39, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 19:30, Max Reitz wrote:
>> On 14.10.20 17:22, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.10.2020 15:51, Max Reitz wrote:
>>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>>>> COR-driver to skip unneeded reading. It can be taken into account for
>>>>> the COR-algorithms optimization. That check is being made during the
>>>>> block stream job by the moment.
>>>>>
>>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>>> ---
>>>>>    block/copy-on-read.c | 13 +++++++++----
>>>>>    block/io.c           |  3 ++-
>>>>>    2 files changed, 11 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>>> index b136895..278a11a 100644
>>>>> --- a/block/copy-on-read.c
>>>>> +++ b/block/copy-on-read.c
>>>>> @@ -148,10 +148,15 @@ static int coroutine_fn
>>>>> cor_co_preadv_part(BlockDriverState *bs,
>>>>>                }
>>>>>            }
>>>>>    -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>>>> qiov_offset,
>>>>> -                                  local_flags);
>>>>> -        if (ret < 0) {
>>>>> -            return ret;
>>>>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
>>>>
>>>> How about dropping the double negation and using a logical &&
>>>> instead of
>>>> the binary &?
>>>>
>>>>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>>>>> +            /* Skip non-guest reads if no copy needed */
>>>>> +        } else {
>>>>
>>>> Hm.  I would have just written the negated form
>>>>
>>>> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
>>>>
>>>> and put the “skip” comment above that condition.
>>>>
>>>> (Since local_flags is initialized to flags, it can be written as a
>>>> single comparison, but that’s a matter of taste and I’m not going to
>>>> recommend either over the other:
>>>>
>>>> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
>>>> BDRV_REQ_PREFETCH)
>>>>
>>>> )
>>>>
>>>>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov,
>>>>> qiov_offset,
>>>>> +                                      local_flags);
>>>>> +            if (ret < 0) {
>>>>> +                return ret;
>>>>> +            }
>>>>>            }
>>>>>              offset += n;
>>>>> diff --git a/block/io.c b/block/io.c
>>>>> index 11df188..bff1808 100644
>>>>> --- a/block/io.c
>>>>> +++ b/block/io.c
>>>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn
>>>>> bdrv_aligned_preadv(BdrvChild *child,
>>>>>          max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>>>        if (bytes <= max_bytes && bytes <= max_transfer) {
>>>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov,
>>>>> qiov_offset, 0);
>>>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov,
>>>>> qiov_offset,
>>>>> +                                 flags & bs->supported_read_flags);
>>>
>>>
>>> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be)
>>> NULL. This means, that we can't just drop the flag when call the driver
>>> that doesn't support it.
>>
>> True. :/
>>
>>> Actually, if driver doesn't support the PREFETCH flag we should do
>>> nothing.
>>
>> Hm.  But at least in the case of COR, PREFETCH is not something that can
>> be optimized to be a no-op (unless the COR is a no-op); it still denotes
>> a command that must be executed.
>>
>> So if we can’t pass it to the driver, I don’t think we should do
>> nothing, but to return an error.  Or maybe we could even assert that it
>> isn’t set for drivers that don’t support it, because at least right now
>> such a case would just be a bug.
> 
> Hmm. Reasonable..
> 
> So, let me summarize the cases:
> 
> 1. bdrv_co_preadv(.. , PREFETCH | COR)
> 
>   In this case generic layer should handle both flags and pass flags=0
> to driver
> 
> 2. bdrv_co_preadv(.., PREFETCH)
> 
> 2.1 driver supporst PREFETCH
>     OK, pass PREFETCH to driver, no problems
> 
> 2.2 driver doesn't support PREFETCH
> 
>   We can just abort() here, as the only source of PREFETCH without COR
> would be
>   stream job driver, which must read from COR filter.
> 
>   More generic solution is to allocate temporary buffer (at least if
> qiov is NULL)
>   and call underlying driver .preadv with flags=0 on that temporary
> buffer. But
>   just abort() is simpler and should work for now.

Agreed.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-14 18:57     ` Andrey Shinkevich
@ 2020-10-15 15:56       ` Max Reitz
  2020-10-15 17:37         ` Andrey Shinkevich
  0 siblings, 1 reply; 56+ messages in thread
From: Max Reitz @ 2020-10-15 15:56 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, fam, vsementsov, libvir-list, qemu-devel, armbru,
	stefanha, den, jsnow


[-- Attachment #1.1: Type: text/plain, Size: 3211 bytes --]

On 14.10.20 20:57, Andrey Shinkevich wrote:
> On 14.10.2020 15:01, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> Limit COR operations by the base node in the backing chain when the
>>> overlay base node name is given. It will be useful for a block stream
>>> job when the COR-filter is applied. The overlay base node is passed as
>>> the base itself may change due to concurrent commit jobs on the same
>>> backing chain.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>   1 file changed, 37 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index c578b1b..dfbd6ad 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>> cor_co_preadv_part(BlockDriverState *bs,
>>>                                              size_t qiov_offset,
>>>                                              int flags)
>>>   {
> 
> [...]
> 
>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>> +                                          state->base_overlay, true,
>>> offset,
>>> +                                          n, &n);
>>> +            if (ret) {
>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>> +            }
>>> +        }
>>
>> Furthermore, I just noticed – can the is_allocated functions not return
>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>   (I’m not sure.)
>>
>> Max
>>
> 
> The check for EOF is managed earlier in the stream_run() for a
> block-stream job. For other cases of using the COR-filter, the check for
> EOF can be added to the cor_co_preadv_part().
> I would be more than happy if we can escape the duplicated checking for
> is_allocated in the block-stream. But how the stream_run() can stop
> calling the blk_co_preadv() when EOF is reached if is_allocated removed
> from it?

True.  Is it that bad to lose that optimization, though?  (And I would
expect the case of a short backing file to be rather rare, too.)

> May the cor_co_preadv_part() return EOF (or other error code)
> to be handled by a caller if (ret == 0 && n == 0 && (flags &
> BDRV_REQ_PREFETCH)?

That sounds like a bad hack.  I’d rather keep the double is_allocated().

But what would be the problem with losing the short backing file
optimization?  Just performance?  Or would we end up writing actual
zeroes into the overlay past the end of the backing file?  Hm, probably
not, if the COR filter would detect that case and handle it like stream
does.

So it seems only a question of performance to me, and I don’t think it
would be that bad to in this rather rare case to have a bunch of useless
is_allocated and is_allocated_above calls past the backing file’s EOF.
(Maybe I’m wrong, though.)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/13] block: apply COR-filter to block-stream jobs
  2020-10-14 16:24   ` Max Reitz
@ 2020-10-15 17:16     ` Andrey Shinkevich
  2020-10-16 15:06       ` Andrey Shinkevich
  2020-10-16 15:45       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-15 17:16 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 14.10.2020 19:24, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:

[...]

>> ---
>>   block/stream.c             | 93 +++++++++++++++++++++++++++++-----------------
>>   tests/qemu-iotests/030     | 51 +++----------------------
>>   tests/qemu-iotests/030.out |  4 +-
>>   tests/qemu-iotests/141.out |  2 +-
>>   tests/qemu-iotests/245     | 19 +++++++---
>>   5 files changed, 81 insertions(+), 88 deletions(-)
> 
> Looks like stream_run() could be a bit streamlined now (the allocation
> checking should be unnecessary, unconditionally calling
> stream_populate() should be sufficient), but not necessary now.
> 

That is what I had kept in my mind when I tackled this patch. But there 
is an underwater reef to streamline. Namely, how the block-stream job 
gets known about a long unallocated tail to exit the loop earlier in the 
stream_run(). Shall we return the '-EOF' or another error code from the 
cor_co_preadv_part() to be handled by the stream_run()? Any other 
suggestions, if any, will be appreciated.

>> diff --git a/block/stream.c b/block/stream.c
>> index d3e1812..93564db 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
> 
> [...]

>> +
>> +    cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, errp);
>> +    if (cor_filter_bs == NULL) {
>> +        goto fail;
>> +    }
>> +
>> +    if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
> 
> Is there a reason why we can’t combine this with the
> bdrv_free_backing_chain() from bs down to above_base?  I mean, the
> effect should be the same, just asking.
> 

The bdrv_freeze_backing_chain(bs, above_base, errp) is called before the 
bdrv_reopen_set_read_only() to keep the backing chain safe during the 
context switch. Then we will want to freeze the 'COR -> TOP BS' link as 
well. Freezing/unfreezing parts is simlier to manage than doing that 
with the whole chain.
If we decide to invoke the bdrv_reopen_set_read_only() after freezing 
the backing chain together with the COR-filter, we will not be able to 
get the 'write' permission on the read-only node.


>> +        bdrv_cor_filter_drop(cor_filter_bs);
>> +        cor_filter_bs = NULL;
>> +        goto fail;
>> +    }
>> +
>> +    s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs,
>> +                         BLK_PERM_CONSISTENT_READ,
>> +                         basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,
> 
> Not that I’m an expert on the GRAPH_MOD permission, but why is this
> shared here but not below?  Shouldn’t it be the same in both cases?
> (Same for taking it as a permission.)
> 

When we invoke the block_job_add_bdrv(&s->common, "active node", bs,..) 
below (particularly, we need it to block the operations on the top node, 
bdrv_op_block_all()), we ask for the GRAPH_MOD permission for the top 
node. To allow that, the parent filter node should share that permission 
for the underlying node. Otherwise, we get assertion failed in the 
bdrv_check_update_perm() called from bdrv_replace_node() when we remove 
the filter.

>>                            speed, creation_flags, NULL, NULL, errp);
>>       if (!s) {
>>           goto fail;
>>       }
>>   
>> +    /*
>> +     * Prevent concurrent jobs trying to modify the graph structure here, we
>> +     * already have our own plans. Also don't allow resize as the image size is
>> +     * queried only at the job start and then cached.
>> +     */
>> +    if (block_job_add_bdrv(&s->common, "active node", bs,
>> +                           basic_flags | BLK_PERM_GRAPH_MOD,
>> +                           basic_flags | BLK_PERM_WRITE, &error_abort)) {
>> +        goto fail;
>> +    }
>> +
>>       /* Block all intermediate nodes between bs and base, because they will
>>        * disappear from the chain after this operation. The streaming job reads
>>        * every block only once, assuming that it doesn't change, so forbid writes
> 
> [...]
> 
>> diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
>> index e60c832..940e85a 100755
>> --- a/tests/qemu-iotests/245
>> +++ b/tests/qemu-iotests/245
>> @@ -899,17 +899,26 @@ class TestBlockdevReopen(iotests.QMPTestCase):
>>           # make hd1 read-only and block-stream requires it to be read-write
>>           # (Which error message appears depends on whether the stream job is
>>           # already done with copying at this point.)
> 
> Hm.  Let’s look at the set of messages below... [1]
> 
>> -        self.reopen(opts, {},
>> +        # As the COR-filter node is inserted into the backing chain with the
>> +        # 'block-stream' operation, we move the options to their proper nodes.
>> +        opts = hd_opts(1)
> 
> Oh, so this patch changes it so that only the subtree below hd1 is
> reopened, and we don’t have to deal with the filter options.  Got it.
> (I think.)
> 

Yes, that's right.

>> +        opts['backing'] = hd_opts(2)
>> +        opts['backing']['backing'] = None
>> +        self.reopen(opts, {'read-only': True},
>>               ["Can't set node 'hd1' to r/o with copy-on-read enabled",
> 
> [1]
> 
> This isn’t done anymore as of this patch.  So I don’t think this error
> message can still appear.  Will some other message appear in its stead,
> or is it always going to be the second one?
> 

The only second message appears in the test case when I run it on my 
node. So, I will remove the first one as the bdrv_enable_copy_on_read() 
is not called for the top BS on the frozen backing chain anymore.
Also, I will delet the part of the comment:
"(Which error message appears depends on whether the stream job is 
already done with copying at this point.)"

>>                "Cannot make block node read-only, there is a writer on it"])
>>   
>>           # We can't remove hd2 while the stream job is ongoing
>> -        opts['backing']['backing'] = None
>> -        self.reopen(opts, {'backing.read-only': False}, "Cannot change 'backing' link from 'hd1' to 'hd2'")
>> +        opts['backing'] = None
>> +        self.reopen(opts, {'read-only': False},
>> +                    "Cannot change 'backing' link from 'hd1' to 'hd2'")
>>   
>> -        # We can detach hd1 from hd0 because it doesn't affect the stream job
>> +        # We can't detach hd1 from hd0 because there is the COR-filter implicit
>> +        # node in between.
>> +        opts = hd_opts(0)
>>           opts['backing'] = None
>> -        self.reopen(opts)
>> +        self.reopen(opts, {},
>> +                    "Cannot change backing link if 'hd0' has an implicit backing file")
> 
> Does “has an implicit backing file” mean that hd0 has an implicit node
> (the COR filter) as its backing file?  And then reopening isn’t allowed
> because the user supposedly doesn’t know about that implicit node?  If
> so, makes sense.

Yes, it is.

Andrey

> 
> Max
> 
>>   
>>           self.vm.run_job('stream0', auto_finalize = False, auto_dismiss = True)
>>   
>>
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-15 15:56       ` Max Reitz
@ 2020-10-15 17:37         ` Andrey Shinkevich
  2020-10-16 14:28           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-15 17:37 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 15.10.2020 18:56, Max Reitz wrote:
> On 14.10.20 20:57, Andrey Shinkevich wrote:
>> On 14.10.2020 15:01, Max Reitz wrote:
>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>> Limit COR operations by the base node in the backing chain when the
>>>> overlay base node name is given. It will be useful for a block stream
>>>> job when the COR-filter is applied. The overlay base node is passed as
>>>> the base itself may change due to concurrent commit jobs on the same
>>>> backing chain.
>>>>
>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>> ---
>>>>    block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>>    1 file changed, 37 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>> index c578b1b..dfbd6ad 100644
>>>> --- a/block/copy-on-read.c
>>>> +++ b/block/copy-on-read.c
>>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>>> cor_co_preadv_part(BlockDriverState *bs,
>>>>                                               size_t qiov_offset,
>>>>                                               int flags)
>>>>    {
>>
>> [...]
>>
>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>>> +                                          state->base_overlay, true,
>>>> offset,
>>>> +                                          n, &n);
>>>> +            if (ret) {
>>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>>> +            }
>>>> +        }
>>>
>>> Furthermore, I just noticed – can the is_allocated functions not return
>>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>>    (I’m not sure.)
>>>
>>> Max
>>>
>>
>> The check for EOF is managed earlier in the stream_run() for a
>> block-stream job. For other cases of using the COR-filter, the check for
>> EOF can be added to the cor_co_preadv_part().
>> I would be more than happy if we can escape the duplicated checking for
>> is_allocated in the block-stream. But how the stream_run() can stop
>> calling the blk_co_preadv() when EOF is reached if is_allocated removed
>> from it?
> 
> True.  Is it that bad to lose that optimization, though?  (And I would
> expect the case of a short backing file to be rather rare, too.)
> 
>> May the cor_co_preadv_part() return EOF (or other error code)
>> to be handled by a caller if (ret == 0 && n == 0 && (flags &
>> BDRV_REQ_PREFETCH)?
> 
> That sounds like a bad hack.  I’d rather keep the double is_allocated().
> 
> But what would be the problem with losing the short backing file
> optimization?  Just performance?  Or would we end up writing actual
> zeroes into the overlay past the end of the backing file?  Hm, probably
> not, if the COR filter would detect that case and handle it like stream
> does.
> 
> So it seems only a question of performance to me, and I don’t think it
> would be that bad to in this rather rare case to have a bunch of useless
> is_allocated and is_allocated_above calls past the backing file’s EOF.
> (Maybe I’m wrong, though.)
> 
> Max
> 

Thank you, Max, for sharing your thoughts on this subject.
The double check for the is_allocated in the stream_run() is a 
performance degradation also.
And we will make a check for the EOF in the cor_co_preadv_part() in 
either case, won't we?

Andrey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver
  2020-10-15 17:37         ` Andrey Shinkevich
@ 2020-10-16 14:28           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-16 14:28 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

15.10.2020 20:37, Andrey Shinkevich wrote:
> On 15.10.2020 18:56, Max Reitz wrote:
>> On 14.10.20 20:57, Andrey Shinkevich wrote:
>>> On 14.10.2020 15:01, Max Reitz wrote:
>>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>>> Limit COR operations by the base node in the backing chain when the
>>>>> overlay base node name is given. It will be useful for a block stream
>>>>> job when the COR-filter is applied. The overlay base node is passed as
>>>>> the base itself may change due to concurrent commit jobs on the same
>>>>> backing chain.
>>>>>
>>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>>> ---
>>>>>    block/copy-on-read.c | 39 +++++++++++++++++++++++++++++++++++++--
>>>>>    1 file changed, 37 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>>> index c578b1b..dfbd6ad 100644
>>>>> --- a/block/copy-on-read.c
>>>>> +++ b/block/copy-on-read.c
>>>>> @@ -122,8 +122,43 @@ static int coroutine_fn
>>>>> cor_co_preadv_part(BlockDriverState *bs,
>>>>>                                               size_t qiov_offset,
>>>>>                                               int flags)
>>>>>    {
>>>
>>> [...]
>>>
>>>>> +            ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
>>>>> +                                          state->base_overlay, true,
>>>>> offset,
>>>>> +                                          n, &n);
>>>>> +            if (ret) {
>>>>> +                local_flags |= BDRV_REQ_COPY_ON_READ;
>>>>> +            }
>>>>> +        }
>>>>
>>>> Furthermore, I just noticed – can the is_allocated functions not return
>>>> 0 in @n, when @offset is a the EOF?  Is that something to look out for?
>>>>    (I’m not sure.)
>>>>
>>>> Max
>>>>
>>>
>>> The check for EOF is managed earlier in the stream_run() for a
>>> block-stream job. For other cases of using the COR-filter, the check for
>>> EOF can be added to the cor_co_preadv_part().
>>> I would be more than happy if we can escape the duplicated checking for
>>> is_allocated in the block-stream. But how the stream_run() can stop
>>> calling the blk_co_preadv() when EOF is reached if is_allocated removed
>>> from it?
>>
>> True.  Is it that bad to lose that optimization, though?  (And I would
>> expect the case of a short backing file to be rather rare, too.)
>>
>>> May the cor_co_preadv_part() return EOF (or other error code)
>>> to be handled by a caller if (ret == 0 && n == 0 && (flags &
>>> BDRV_REQ_PREFETCH)?
>>
>> That sounds like a bad hack.  I’d rather keep the double is_allocated().
>>
>> But what would be the problem with losing the short backing file
>> optimization?  Just performance?  Or would we end up writing actual
>> zeroes into the overlay past the end of the backing file?  Hm, probably
>> not, if the COR filter would detect that case and handle it like stream
>> does.
>>
>> So it seems only a question of performance to me, and I don’t think it
>> would be that bad to in this rather rare case to have a bunch of useless
>> is_allocated and is_allocated_above calls past the backing file’s EOF.
>> (Maybe I’m wrong, though.)
>>
>> Max
>>
> 
> Thank you, Max, for sharing your thoughts on this subject.
> The double check for the is_allocated in the stream_run() is a performance degradation also.
> And we will make a check for the EOF in the cor_co_preadv_part() in either case, won't we?
> 
> Andrey


I'd keep is_allocated logic in block-stream as is for now. It's not good that we check block-status several times (in block-stream, than in cor filter, than in generic COR code), but it shouldn't be real problem, and we can postpone optimizations for the next step.

Also, the resulting architecture is not final. I believe that in bright future block-stream will work through block-copy like backup does. And COR filter will call block_copy() by itself, and generic COR code will be dropped together with BDRV_REQ_COR flag. And stream will do just one background call of block_copy for the whole device, like backup in finish on my in-flight backup series. And all extra levels of block_status checking will leave.

About EOF problem discussed here, let's look at more generic problem: we are going to skip _large_ area, but skipping chunk-by-chunk is inefficient. So, we just want to learn to skip large areas. The simplest way is just to call is_allocated/is_allocated_above
from current offset to device end, if we decided to skip current chunk. Then we'll know how much to skip totally. But that kind of optimization is not directly related to these series and may be done in separate if needed.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/13] block: apply COR-filter to block-stream jobs
  2020-10-15 17:16     ` Andrey Shinkevich
@ 2020-10-16 15:06       ` Andrey Shinkevich
  2020-10-16 15:45       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-16 15:06 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den, vsementsov

On 15.10.2020 20:16, Andrey Shinkevich wrote:
> On 14.10.2020 19:24, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
> 
> [...]
> 
>>> ---
>>>   block/stream.c             | 93 
>>> +++++++++++++++++++++++++++++-----------------
>>>   tests/qemu-iotests/030     | 51 +++----------------------
>>>   tests/qemu-iotests/030.out |  4 +-
>>>   tests/qemu-iotests/141.out |  2 +-
>>>   tests/qemu-iotests/245     | 19 +++++++---
>>>   5 files changed, 81 insertions(+), 88 deletions(-)
>>
>> Looks like stream_run() could be a bit streamlined now (the allocation
>> checking should be unnecessary, unconditionally calling
>> stream_populate() should be sufficient), but not necessary now.
>>
> 
> That is what I had kept in my mind when I tackled this patch. But there 
> is an underwater reef to streamline. Namely, how the block-stream job 
> gets known about a long unallocated tail to exit the loop earlier in the 
> stream_run(). Shall we return the '-EOF' or another error code from the 
> cor_co_preadv_part() to be handled by the stream_run()? Any other 
> suggestions, if any, will be appreciated.
> 
>>> diff --git a/block/stream.c b/block/stream.c
>>> index d3e1812..93564db 100644
>>> --- a/block/stream.c
>>> +++ b/block/stream.c
>>
>> [...]
> 
>>> +
>>> +    cor_filter_bs = bdrv_cor_filter_append(bs, opts, BDRV_O_RDWR, 
>>> errp);
>>> +    if (cor_filter_bs == NULL) {
>>> +        goto fail;
>>> +    }
>>> +
>>> +    if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
>>
>> Is there a reason why we can’t combine this with the
>> bdrv_free_backing_chain() from bs down to above_base?  I mean, the
>> effect should be the same, just asking.
>>
> 
> The bdrv_freeze_backing_chain(bs, above_base, errp) is called before the 
> bdrv_reopen_set_read_only() to keep the backing chain safe during the 
> context switch. Then we will want to freeze the 'COR -> TOP BS' link as 
> well. Freezing/unfreezing parts is simlier to manage than doing that 
> with the whole chain.
> If we decide to invoke the bdrv_reopen_set_read_only() after freezing 
> the backing chain together with the COR-filter, we will not be able to 
> get the 'write' permission on the read-only node.
> 
> 
>>> +        bdrv_cor_filter_drop(cor_filter_bs);
>>> +        cor_filter_bs = NULL;
>>> +        goto fail;
>>> +    }
>>> +
>>> +    s = block_job_create(job_id, &stream_job_driver, NULL, 
>>> cor_filter_bs,
>>> +                         BLK_PERM_CONSISTENT_READ,
>>> +                         basic_flags | BLK_PERM_WRITE | 
>>> BLK_PERM_GRAPH_MOD,
>>
>> Not that I’m an expert on the GRAPH_MOD permission, but why is this
>> shared here but not below?  Shouldn’t it be the same in both cases?
>> (Same for taking it as a permission.)
>>
> 
> When we invoke the block_job_add_bdrv(&s->common, "active node", bs,..) 
> below (particularly, we need it to block the operations on the top node, 
> bdrv_op_block_all()), we ask for the GRAPH_MOD permission for the top 
> node. To allow that, the parent filter node should share that permission 
> for the underlying node. Otherwise, we get assertion failed in the 
> bdrv_check_update_perm() called from bdrv_replace_node() when we remove 
> the filter.
> 

I will add my comments above to the code.

Andrey


[...]


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/13] block: apply COR-filter to block-stream jobs
  2020-10-15 17:16     ` Andrey Shinkevich
  2020-10-16 15:06       ` Andrey Shinkevich
@ 2020-10-16 15:45       ` Vladimir Sementsov-Ogievskiy
  2020-10-20 19:43         ` Andrey Shinkevich
  1 sibling, 1 reply; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-16 15:45 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

15.10.2020 20:16, Andrey Shinkevich wrote:
> On 14.10.2020 19:24, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
> 
> [...]
> 
>>> ---
>>>   block/stream.c             | 93 +++++++++++++++++++++++++++++-----------------
>>>   tests/qemu-iotests/030     | 51 +++----------------------
>>>   tests/qemu-iotests/030.out |  4 +-
>>>   tests/qemu-iotests/141.out |  2 +-
>>>   tests/qemu-iotests/245     | 19 +++++++---
>>>   5 files changed, 81 insertions(+), 88 deletions(-)
>>
>> Looks like stream_run() could be a bit streamlined now (the allocation
>> checking should be unnecessary, unconditionally calling
>> stream_populate() should be sufficient), but not necessary now.
>>
> 
> That is what I had kept in my mind when I tackled this patch. But there is an underwater reef to streamline. Namely, how the block-stream job gets known about a long unallocated tail to exit the loop earlier in the stream_run(). Shall we return the '-EOF' or another error code from the cor_co_preadv_part() to be handled by the stream_run()? Any other suggestions, if any, will be appreciated.

Just calling read CHUNK by CHUNK may be less efficient than is_allocated()-driven loop: you may end up with splitting regions unaligned to CHUNK-granularity, which would not be splitted with is_allocated()-driven loop. Current loop allows chunks unaligned to CHUNK.

So, I think, it's better to keep is_allocated() logic as is for now.



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 13/13] block: apply COR-filter to block-stream jobs
  2020-10-16 15:45       ` Vladimir Sementsov-Ogievskiy
@ 2020-10-20 19:43         ` Andrey Shinkevich
  0 siblings, 0 replies; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-20 19:43 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

On 16.10.2020 18:45, Vladimir Sementsov-Ogievskiy wrote:
> 15.10.2020 20:16, Andrey Shinkevich wrote:
>> On 14.10.2020 19:24, Max Reitz wrote:
>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>
>> [...]
>>
>>>> ---
>>>>   block/stream.c             | 93 
>>>> +++++++++++++++++++++++++++++-----------------
>>>>   tests/qemu-iotests/030     | 51 +++----------------------
>>>>   tests/qemu-iotests/030.out |  4 +-
>>>>   tests/qemu-iotests/141.out |  2 +-
>>>>   tests/qemu-iotests/245     | 19 +++++++---
>>>>   5 files changed, 81 insertions(+), 88 deletions(-)
>>>
>>> Looks like stream_run() could be a bit streamlined now (the allocation
>>> checking should be unnecessary, unconditionally calling
>>> stream_populate() should be sufficient), but not necessary now.
>>>
>>
>> That is what I had kept in my mind when I tackled this patch. But 
>> there is an underwater reef to streamline. Namely, how the 
>> block-stream job gets known about a long unallocated tail to exit the 
>> loop earlier in the stream_run(). Shall we return the '-EOF' or 
>> another error code from the cor_co_preadv_part() to be handled by the 
>> stream_run()? Any other suggestions, if any, will be appreciated.
> 
> Just calling read CHUNK by CHUNK may be less efficient than 
> is_allocated()-driven loop: you may end up with splitting regions 
> unaligned to CHUNK-granularity, which would not be splitted with 
> is_allocated()-driven loop. Current loop allows chunks unaligned to CHUNK.

The cor_co_preadv_part() will check for the end of a file in the next 
version. So, the unalignment is not going to be the issue.

Andrey

> 
> So, I think, it's better to keep is_allocated() logic as is for now.
> 
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-14 15:22     ` Vladimir Sementsov-Ogievskiy
  2020-10-14 16:30       ` Max Reitz
@ 2020-10-21 20:43       ` Andrey Shinkevich
  2020-10-22  7:50         ` Andrey Shinkevich
  1 sibling, 1 reply; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-21 20:43 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

On 14.10.2020 18:22, Vladimir Sementsov-Ogievskiy wrote:
> 14.10.2020 15:51, Max Reitz wrote:
>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>> COR-driver to skip unneeded reading. It can be taken into account for
>>> the COR-algorithms optimization. That check is being made during the
>>> block stream job by the moment.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 13 +++++++++----
>>>   block/io.c           |  3 ++-
>>>   2 files changed, 11 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index b136895..278a11a 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -148,10 +148,15 @@ static int coroutine_fn 
>>> cor_co_preadv_part(BlockDriverState *bs,
>>>               }
>>>           }
>>> -        ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, 
>>> qiov_offset,
>>> -                                  local_flags);
>>> -        if (ret < 0) {
>>> -            return ret;
>>> +        if (!!(flags & BDRV_REQ_PREFETCH) &
>>
>> How about dropping the double negation and using a logical && instead of
>> the binary &?
>>
>>> +            !(local_flags & BDRV_REQ_COPY_ON_READ)) {
>>> +            /* Skip non-guest reads if no copy needed */
>>> +        } else {
>>
>> Hm.  I would have just written the negated form
>>
>> (!(flags & BDRV_REQ_PREFETCH) || (local_flags & BDRV_REQ_COPY_ON_READ))
>>
>> and put the “skip” comment above that condition.
>>
>> (Since local_flags is initialized to flags, it can be written as a
>> single comparison, but that’s a matter of taste and I’m not going to
>> recommend either over the other:
>>
>> ((local_flags & (BDRV_REQ_PREFETCH | BDRV_REQ_COPY_ON_READ)) !=
>> BDRV_REQ_PREFETCH)
>>
>> )
>>
>>> +            ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, 
>>> qiov_offset,
>>> +                                      local_flags);
>>> +            if (ret < 0) {
>>> +                return ret;
>>> +            }
>>>           }
>>>           offset += n;
>>> diff --git a/block/io.c b/block/io.c
>>> index 11df188..bff1808 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn 
>>> bdrv_aligned_preadv(BdrvChild *child,
>>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 
>>> qiov_offset, 0);
>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>>> +                                 flags & bs->supported_read_flags);
> 
> 
> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) 
> NULL. This means, that we can't just drop the flag when call the driver 
> that doesn't support it.
> 
> Actually, if driver doesn't support the PREFETCH flag we should do nothing.
> 
> 
>>
>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
>> why it isn’t.
>>
> 
> 
> Could it be part of patch 07? I mean introduce new field 
> supported_read_flags and handle it in generic code in one patch, prior 
> to implementing support for it in COR driver.
> 
> 

We have to add the supported flags for the COR driver in the same patch. 
Or before handling the supported_read_flags at the generic layer 
(handling zero does not make a sence). Otherwise, the test #216 (where 
the COR-filter is applied) will not pass.

Andrey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-21 20:43       ` Andrey Shinkevich
@ 2020-10-22  7:50         ` Andrey Shinkevich
  2020-10-22  8:56           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 56+ messages in thread
From: Andrey Shinkevich @ 2020-10-22  7:50 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den


On 21.10.2020 23:43, Andrey Shinkevich wrote:
> On 14.10.2020 18:22, Vladimir Sementsov-Ogievskiy wrote:
>> 14.10.2020 15:51, Max Reitz wrote:
>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>>> COR-driver to skip unneeded reading. It can be taken into account for
>>>> the COR-algorithms optimization. That check is being made during the
>>>> block stream job by the moment.
>>>>
>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>> ---

[...]

>>>> diff --git a/block/io.c b/block/io.c
>>>> index 11df188..bff1808 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn 
>>>> bdrv_aligned_preadv(BdrvChild *child,
>>>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, 
>>>> qiov_offset, 0);
>>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>>>> +                                 flags & bs->supported_read_flags);
>>
>>
>> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should 
>> be) NULL. This means, that we can't just drop the flag when call the 
>> driver that doesn't support it.
>>
>> Actually, if driver doesn't support the PREFETCH flag we should do 
>> nothing.
>>
>>
>>>
>>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
>>> why it isn’t.
>>>
>>
>>
>> Could it be part of patch 07? I mean introduce new field 
>> supported_read_flags and handle it in generic code in one patch, prior 
>> to implementing support for it in COR driver.
>>
>>
> 
> We have to add the supported flags for the COR driver in the same patch. 
> Or before handling the supported_read_flags at the generic layer 
> (handling zero does not make a sence). Otherwise, the test #216 (where 
> the COR-filter is applied) will not pass.
> 
> Andrey

I have found a workaround and am going to send all the related patches 
as a separate series.

Andrey


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed
  2020-10-22  7:50         ` Andrey Shinkevich
@ 2020-10-22  8:56           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 56+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-10-22  8:56 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, fam, stefanha, armbru, jsnow, libvir-list,
	eblake, den

22.10.2020 10:50, Andrey Shinkevich wrote:
> 
> On 21.10.2020 23:43, Andrey Shinkevich wrote:
>> On 14.10.2020 18:22, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.10.2020 15:51, Max Reitz wrote:
>>>> On 12.10.20 19:43, Andrey Shinkevich wrote:
>>>>> If the flag BDRV_REQ_PREFETCH was set, pass it further to the
>>>>> COR-driver to skip unneeded reading. It can be taken into account for
>>>>> the COR-algorithms optimization. That check is being made during the
>>>>> block stream job by the moment.
>>>>>
>>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>>> ---
> 
> [...]
> 
>>>>> diff --git a/block/io.c b/block/io.c
>>>>> index 11df188..bff1808 100644
>>>>> --- a/block/io.c
>>>>> +++ b/block/io.c
>>>>> @@ -1512,7 +1512,8 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
>>>>>       max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align);
>>>>>       if (bytes <= max_bytes && bytes <= max_transfer) {
>>>>> -        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset, 0);
>>>>> +        ret = bdrv_driver_preadv(bs, offset, bytes, qiov, qiov_offset,
>>>>> +                                 flags & bs->supported_read_flags);
>>>
>>>
>>> When BDRV_REQ_PREFETCH is passed, qiov may be (and generally should be) NULL. This means, that we can't just drop the flag when call the driver that doesn't support it.
>>>
>>> Actually, if driver doesn't support the PREFETCH flag we should do nothing.
>>>
>>>
>>>>
>>>> Ah, OK.  I see.  I expected this to be a separate patch.  I still wonder
>>>> why it isn’t.
>>>>
>>>
>>>
>>> Could it be part of patch 07? I mean introduce new field supported_read_flags and handle it in generic code in one patch, prior to implementing support for it in COR driver.
>>>
>>>
>>
>> We have to add the supported flags for the COR driver in the same patch. Or before handling the supported_read_flags at the generic layer (handling zero does not make a sence). Otherwise, the test #216 (where the COR-filter is applied) will not pass.
>>
>> Andrey
> 
> I have found a workaround and am going to send all the related patches as a separate series.
> 

What is the problem?

If in a separate patch prior to modifying COR driver, we add new field supported_read_flags and add a support for it in generic layer, when no driver support it yet, nothing should be changed and all tests should pass..



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2020-10-22  8:58 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-12 17:43 [PATCH v11 00/13] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
2020-10-12 17:43 ` [PATCH v11 01/13] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
2020-10-12 17:43 ` [PATCH v11 02/13] copy-on-read: add filter append/drop functions Andrey Shinkevich via
2020-10-14 10:44   ` Max Reitz
2020-10-14 14:28     ` Andrey Shinkevich
2020-10-14 16:26       ` Max Reitz
2020-10-12 17:43 ` [PATCH v11 03/13] qapi: add filter-node-name to block-stream Andrey Shinkevich via
2020-10-12 17:43 ` [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver Andrey Shinkevich via
2020-10-14 11:09   ` Max Reitz
2020-10-14 11:57     ` Max Reitz
2020-10-14 14:56       ` Vladimir Sementsov-Ogievskiy
2020-10-14 16:27         ` Max Reitz
2020-10-14 16:08     ` Andrey Shinkevich
2020-10-14 16:18       ` Vladimir Sementsov-Ogievskiy
2020-10-14 16:36       ` Max Reitz
2020-10-12 17:43 ` [PATCH v11 05/13] copy-on-read: limit COR operations to base in " Andrey Shinkevich via
2020-10-14 11:59   ` Max Reitz
2020-10-14 17:43     ` Andrey Shinkevich
2020-10-14 12:01   ` Max Reitz
2020-10-14 18:57     ` Andrey Shinkevich
2020-10-15 15:56       ` Max Reitz
2020-10-15 17:37         ` Andrey Shinkevich
2020-10-16 14:28           ` Vladimir Sementsov-Ogievskiy
2020-10-12 17:43 ` [PATCH v11 06/13] block: modify the comment for BDRV_REQ_PREFETCH flag Andrey Shinkevich via
2020-10-14 12:22   ` Max Reitz
2020-10-14 15:04     ` Vladimir Sementsov-Ogievskiy
2020-10-14 19:57     ` Andrey Shinkevich
2020-10-12 17:43 ` [PATCH v11 07/13] block: include supported_read_flags into BDS structure Andrey Shinkevich via
2020-10-14 12:31   ` Max Reitz
2020-10-12 17:43 ` [PATCH v11 08/13] copy-on-read: add support for BDRV_REQ_PREFETCH to COR-filter Andrey Shinkevich via
2020-10-14 12:40   ` Max Reitz
2020-10-12 17:43 ` [PATCH v11 09/13] copy-on-read: skip non-guest reads if no copy needed Andrey Shinkevich via
2020-10-14 12:51   ` Max Reitz
2020-10-14 15:22     ` Vladimir Sementsov-Ogievskiy
2020-10-14 16:30       ` Max Reitz
2020-10-14 16:39         ` Vladimir Sementsov-Ogievskiy
2020-10-15 15:49           ` Max Reitz
2020-10-21 20:43       ` Andrey Shinkevich
2020-10-22  7:50         ` Andrey Shinkevich
2020-10-22  8:56           ` Vladimir Sementsov-Ogievskiy
2020-10-14 20:49     ` Andrey Shinkevich
2020-10-12 17:43 ` [PATCH v11 10/13] stream: skip filters when writing backing file name to QCOW2 header Andrey Shinkevich via
2020-10-14 15:02   ` Max Reitz
2020-10-14 15:40     ` Vladimir Sementsov-Ogievskiy
2020-10-12 17:43 ` [PATCH v11 11/13] stream: mark backing-file argument as deprecated Andrey Shinkevich via
2020-10-14 15:03   ` Max Reitz
2020-10-14 15:43     ` Vladimir Sementsov-Ogievskiy
2020-10-15  9:01       ` Andrey Shinkevich
2020-10-12 17:43 ` [PATCH v11 12/13] stream: remove unused backing-file name parameter Andrey Shinkevich via
2020-10-14 15:05   ` Max Reitz
2020-10-12 17:43 ` [PATCH v11 13/13] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
2020-10-14 16:24   ` Max Reitz
2020-10-15 17:16     ` Andrey Shinkevich
2020-10-16 15:06       ` Andrey Shinkevich
2020-10-16 15:45       ` Vladimir Sementsov-Ogievskiy
2020-10-20 19:43         ` Andrey Shinkevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).