All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] Apply COR-filter to the block-stream permanently
@ 2020-08-28 16:52 Andrey Shinkevich via
  2020-08-28 16:52 ` [PATCH v8 1/7] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

Note: this series is based on the another one "block: Deal with filters"
      by Max Reitz that could be found in the branches:
      https://git.xanclic.moe/XanClic/qemu child-access-functions-v6
      https://github.com/XanClic/qemu child-access-functions-v6

v8:
  03: qapi - version changed to 'Since: 5.2'.
  04: New.
  05: New.
  06: New.
  07: The extra bs variable and the enable_cor were dropped.

The v7 Message-Id:
<1598257914-887267-1-git-send-email-andrey.shinkevich@virtuozzo.com>

Andrey Shinkevich (7):
  copy-on-read: Support preadv/pwritev_part functions
  copy-on-read: add filter append/drop functions
  qapi: add filter-node-name to block-stream
  copy-on-read: pass base file name to COR driver
  copy-on-read: limit guest writes to base in COR driver
  block-stream: freeze link to base node during stream job
  block: apply COR-filter to block-stream jobs

 block/copy-on-read.c           | 218 ++++++++++++++++++++++++++++++++++++++---
 block/copy-on-read.h           |  36 +++++++
 block/monitor/block-hmp-cmds.c |   4 +-
 block/stream.c                 | 105 ++++++++++----------
 blockdev.c                     |   4 +-
 include/block/block_int.h      |   7 +-
 qapi/block-core.json           |   6 ++
 tests/qemu-iotests/030         |  60 +++---------
 tests/qemu-iotests/030.out     |   4 +-
 tests/qemu-iotests/245         |  21 ++--
 tests/qemu-iotests/258         | 161 ------------------------------
 tests/qemu-iotests/258.out     |  33 -------
 12 files changed, 341 insertions(+), 318 deletions(-)
 create mode 100644 block/copy-on-read.h
 delete mode 100755 tests/qemu-iotests/258
 delete mode 100644 tests/qemu-iotests/258.out

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 1/7] copy-on-read: Support preadv/pwritev_part functions
  2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
@ 2020-08-28 16:52 ` Andrey Shinkevich via
  2020-08-28 16:52 ` [PATCH v8 2/7] copy-on-read: add filter append/drop functions Andrey Shinkevich via
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

Add support for the recently introduced functions
bdrv_co_preadv_part()
and
bdrv_co_pwritev_part()
to the COR-filter driver.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-on-read.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 2816e61..cb03e0f 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -74,21 +74,25 @@ static int64_t cor_getlength(BlockDriverState *bs)
 }
 
 
-static int coroutine_fn cor_co_preadv(BlockDriverState *bs,
-                                      uint64_t offset, uint64_t bytes,
-                                      QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_preadv_part(BlockDriverState *bs,
+                                           uint64_t offset, uint64_t bytes,
+                                           QEMUIOVector *qiov,
+                                           size_t qiov_offset,
+                                           int flags)
 {
-    return bdrv_co_preadv(bs->file, offset, bytes, qiov,
-                          flags | BDRV_REQ_COPY_ON_READ);
+    return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
+                               flags | BDRV_REQ_COPY_ON_READ);
 }
 
 
-static int coroutine_fn cor_co_pwritev(BlockDriverState *bs,
-                                       uint64_t offset, uint64_t bytes,
-                                       QEMUIOVector *qiov, int flags)
+static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
+                                            uint64_t offset,
+                                            uint64_t bytes,
+                                            QEMUIOVector *qiov,
+                                            size_t qiov_offset, int flags)
 {
-
-    return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+    return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
+                                flags);
 }
 
 
@@ -137,8 +141,8 @@ static BlockDriver bdrv_copy_on_read = {
 
     .bdrv_getlength                     = cor_getlength,
 
-    .bdrv_co_preadv                     = cor_co_preadv,
-    .bdrv_co_pwritev                    = cor_co_pwritev,
+    .bdrv_co_preadv_part                = cor_co_preadv_part,
+    .bdrv_co_pwritev_part               = cor_co_pwritev_part,
     .bdrv_co_pwrite_zeroes              = cor_co_pwrite_zeroes,
     .bdrv_co_pdiscard                   = cor_co_pdiscard,
     .bdrv_co_pwritev_compressed         = cor_co_pwritev_compressed,
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
  2020-08-28 16:52 ` [PATCH v8 1/7] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
@ 2020-08-28 16:52 ` Andrey Shinkevich via
  2020-09-04 11:22   ` Max Reitz
  2020-08-28 16:52 ` [PATCH v8 3/7] qapi: add filter-node-name to block-stream Andrey Shinkevich via
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

Provide API for the COR-filter insertion/removal.
Also, drop the filter child permissions for an inactive state when the
filter node is being removed.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/copy-on-read.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/copy-on-read.h |  35 +++++++++++++++++
 2 files changed, 139 insertions(+)
 create mode 100644 block/copy-on-read.h

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index cb03e0f..0ede7aa 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -23,11 +23,21 @@
 #include "qemu/osdep.h"
 #include "block/block_int.h"
 #include "qemu/module.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "block/copy-on-read.h"
+
+
+typedef struct BDRVStateCOR {
+    bool active;
+} BDRVStateCOR;
 
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
+    BDRVStateCOR *state = bs->opaque;
+
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
                                false, errp);
@@ -42,6 +52,13 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
         ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
             bs->file->bs->supported_zero_flags);
 
+    state->active = true;
+
+    /*
+     * We don't need to call bdrv_child_refresh_perms() now as the permissions
+     * will be updated later when the filter node gets its parent.
+     */
+
     return 0;
 }
 
@@ -57,6 +74,17 @@ static void cor_child_perm(BlockDriverState *bs, BdrvChild *c,
                            uint64_t perm, uint64_t shared,
                            uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVStateCOR *s = bs->opaque;
+
+    if (!s->active) {
+        /*
+         * While the filter is being removed
+         */
+        *nperm = 0;
+        *nshared = BLK_PERM_ALL;
+        return;
+    }
+
     *nperm = perm & PERM_PASSTHROUGH;
     *nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
 
@@ -135,6 +163,7 @@ static void cor_lock_medium(BlockDriverState *bs, bool locked)
 
 static BlockDriver bdrv_copy_on_read = {
     .format_name                        = "copy-on-read",
+    .instance_size                      = sizeof(BDRVStateCOR),
 
     .bdrv_open                          = cor_open,
     .bdrv_child_perm                    = cor_child_perm,
@@ -159,4 +188,79 @@ static void bdrv_copy_on_read_init(void)
     bdrv_register(&bdrv_copy_on_read);
 }
 
+
+static BlockDriverState *create_filter_node(BlockDriverState *bs,
+                                            const char *filter_node_name,
+                                            Error **errp)
+{
+    QDict *opts = qdict_new();
+
+    qdict_put_str(opts, "driver", "copy-on-read");
+    qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+    if (filter_node_name) {
+        qdict_put_str(opts, "node-name", filter_node_name);
+    }
+
+    return bdrv_open(NULL, NULL, opts, BDRV_O_RDWR, errp);
+}
+
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         const char *filter_node_name,
+                                         Error **errp)
+{
+    BlockDriverState *cor_filter_bs;
+    Error *local_err = NULL;
+
+    cor_filter_bs = create_filter_node(bs, filter_node_name, errp);
+    if (cor_filter_bs == NULL) {
+        error_prepend(errp, "Could not create filter node: ");
+        return NULL;
+    }
+
+    if (!filter_node_name) {
+        cor_filter_bs->implicit = true;
+    }
+
+    bdrv_drained_begin(bs);
+    bdrv_replace_node(bs, cor_filter_bs, &local_err);
+    bdrv_drained_end(bs);
+
+    if (local_err) {
+        bdrv_unref(cor_filter_bs);
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    return cor_filter_bs;
+}
+
+
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs)
+{
+    BdrvChild *child;
+    BlockDriverState *bs;
+    BDRVStateCOR *s = cor_filter_bs->opaque;
+
+    child = bdrv_filter_child(cor_filter_bs);
+    if (!child) {
+        return;
+    }
+    bs = child->bs;
+
+    /* Retain the BDS until we complete the graph change. */
+    bdrv_ref(bs);
+    /* Hold a guest back from writing while permissions are being reset. */
+    bdrv_drained_begin(bs);
+    /* Drop permissions before the graph change. */
+    s->active = false;
+    bdrv_child_refresh_perms(cor_filter_bs, child, &error_abort);
+    bdrv_replace_node(cor_filter_bs, bs, &error_abort);
+
+    bdrv_drained_end(bs);
+    bdrv_unref(bs);
+    bdrv_unref(cor_filter_bs);
+}
+
+
 block_init(bdrv_copy_on_read_init);
diff --git a/block/copy-on-read.h b/block/copy-on-read.h
new file mode 100644
index 0000000..1686e4e
--- /dev/null
+++ b/block/copy-on-read.h
@@ -0,0 +1,35 @@
+/*
+ * Copy-on-read filter block driver
+ *
+ * The filter driver performs Copy-On-Read (COR) operations
+ *
+ * Copyright (c) 2018-2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *   Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef COPY_ON_READ_FILTER
+#define COPY_ON_READ_FILTER
+
+#include "block/block_int.h"
+
+BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         const char *filter_node_name,
+                                         Error **errp);
+void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs);
+
+#endif /* COPY_ON_READ_FILTER */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 3/7] qapi: add filter-node-name to block-stream
  2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
  2020-08-28 16:52 ` [PATCH v8 1/7] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
  2020-08-28 16:52 ` [PATCH v8 2/7] copy-on-read: add filter append/drop functions Andrey Shinkevich via
@ 2020-08-28 16:52 ` Andrey Shinkevich via
  2020-08-28 16:52 ` [PATCH v8 4/7] copy-on-read: pass base file name to COR driver Andrey Shinkevich via
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

Provide the possibility to pass the 'filter-node-name' parameter to the
block-stream job as it is done for the commit block job.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/monitor/block-hmp-cmds.c | 4 ++--
 block/stream.c                 | 4 +++-
 blockdev.c                     | 4 +++-
 include/block/block_int.h      | 7 ++++++-
 qapi/block-core.json           | 6 ++++++
 5 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4d3db5e..4e66775 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -507,8 +507,8 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 
     qmp_block_stream(true, device, device, base != NULL, base, false, NULL,
                      false, NULL, qdict_haskey(qdict, "speed"), speed, true,
-                     BLOCKDEV_ON_ERROR_REPORT, false, false, false, false,
-                     &error);
+                     BLOCKDEV_ON_ERROR_REPORT, false, NULL, false, false, false,
+                     false, &error);
 
     hmp_handle_error(mon, error);
 }
diff --git a/block/stream.c b/block/stream.c
index b9c1141..8bf6b6d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -221,7 +221,9 @@ static const BlockJobDriver stream_job_driver = {
 void stream_start(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *base, const char *backing_file_str,
                   int creation_flags, int64_t speed,
-                  BlockdevOnError on_error, Error **errp)
+                  BlockdevOnError on_error,
+                  const char *filter_node_name,
+                  Error **errp)
 {
     StreamBlockJob *s;
     BlockDriverState *iter;
diff --git a/blockdev.c b/blockdev.c
index 237fffb..cc531cb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2476,6 +2476,7 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
                       bool has_backing_file, const char *backing_file,
                       bool has_speed, int64_t speed,
                       bool has_on_error, BlockdevOnError on_error,
+                      bool has_filter_node_name, const char *filter_node_name,
                       bool has_auto_finalize, bool auto_finalize,
                       bool has_auto_dismiss, bool auto_dismiss,
                       Error **errp)
@@ -2558,7 +2559,8 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device,
     }
 
     stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
-                 job_flags, has_speed ? speed : 0, on_error, &local_err);
+                 job_flags, has_speed ? speed : 0, on_error,
+                 filter_node_name, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto out;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 465a601..3efde33 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1122,6 +1122,9 @@ int is_windows_drive(const char *filename);
  *                  See @BlockJobCreateFlags
  * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
  * @on_error: The action to take upon error.
+ * @filter_node_name: The node name that should be assigned to the filter
+ * driver that the commit job inserts into the graph above @bs. NULL means
+ * that a node name should be autogenerated.
  * @errp: Error object.
  *
  * Start a streaming operation on @bs.  Clusters that are unallocated
@@ -1134,7 +1137,9 @@ int is_windows_drive(const char *filename);
 void stream_start(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *base, const char *backing_file_str,
                   int creation_flags, int64_t speed,
-                  BlockdevOnError on_error, Error **errp);
+                  BlockdevOnError on_error,
+                  const char *filter_node_name,
+                  Error **errp);
 
 /**
  * commit_start:
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0b8ccd3..e5ccf8a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2524,6 +2524,11 @@
 #            'stop' and 'enospc' can only be used if the block device
 #            supports io-status (see BlockInfo).  Since 1.3.
 #
+# @filter-node-name: the node name that should be assigned to the
+#                    filter driver that the stream job inserts into the graph
+#                    above @device. If this option is not given, a node name is
+#                    autogenerated. (Since: 5.2)
+#
 # @auto-finalize: When false, this job will wait in a PENDING state after it has
 #                 finished its work, waiting for @block-job-finalize before
 #                 making any block graph changes.
@@ -2554,6 +2559,7 @@
   'data': { '*job-id': 'str', 'device': 'str', '*base': 'str',
             '*base-node': 'str', '*backing-file': 'str', '*speed': 'int',
             '*on-error': 'BlockdevOnError',
+            '*filter-node-name': 'str',
             '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
 
 ##
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 4/7] copy-on-read: pass base file name to COR driver
  2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (2 preceding siblings ...)
  2020-08-28 16:52 ` [PATCH v8 3/7] qapi: add filter-node-name to block-stream Andrey Shinkevich via
@ 2020-08-28 16:52 ` Andrey Shinkevich via
  2020-09-04 12:17   ` Max Reitz
  2020-08-28 16:52 ` [PATCH v8 5/7] copy-on-read: limit guest writes to base in " Andrey Shinkevich via
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

To limit the guest's COR operations by the base node in the backing
chain during stream job, pass the base file name to the copy-on-read
driver. The rest of the functionality will be implemented in the patch
that follows.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/copy-on-read.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 block/copy-on-read.h |  1 +
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 0ede7aa..1f858bb 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -24,19 +24,45 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
 #include "qapi/qmp/qdict.h"
 #include "block/copy-on-read.h"
 
 
 typedef struct BDRVStateCOR {
     bool active;
+    BlockDriverState *base_bs;
 } BDRVStateCOR;
 
+/*
+ * Non-zero pointers are the caller's responsibility.
+ */
+static BlockDriverState *get_base_by_name(BlockDriverState *bs,
+                                          const char *base_name, Error **errp)
+{
+    BlockDriverState *base_bs = NULL;
+    AioContext *aio_context;
+
+    base_bs = bdrv_find_backing_image(bs, base_name);
+    if (base_bs == NULL) {
+        error_setg(errp, QERR_BASE_NOT_FOUND, base_name);
+        return NULL;
+    }
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+    assert(bdrv_get_aio_context(base_bs) == aio_context);
+    aio_context_release(aio_context);
+
+    return base_bs;
+}
 
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
+    BlockDriverState *base_bs = NULL;
     BDRVStateCOR *state = bs->opaque;
+    const char *base_name = qdict_get_try_str(options, "base");
 
     bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
                                BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -52,7 +78,15 @@ static int cor_open(BlockDriverState *bs, QDict *options, int flags,
         ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
             bs->file->bs->supported_zero_flags);
 
+    if (base_name) {
+        qdict_del(options, "base");
+        base_bs = get_base_by_name(bs, base_name, errp);
+        if (!base_bs) {
+            return -EINVAL;
+        }
+    }
     state->active = true;
+    state->base_bs = base_bs;
 
     /*
      * We don't need to call bdrv_child_refresh_perms() now as the permissions
@@ -190,6 +224,7 @@ static void bdrv_copy_on_read_init(void)
 
 
 static BlockDriverState *create_filter_node(BlockDriverState *bs,
+                                            const char *base_name,
                                             const char *filter_node_name,
                                             Error **errp)
 {
@@ -197,6 +232,9 @@ static BlockDriverState *create_filter_node(BlockDriverState *bs,
 
     qdict_put_str(opts, "driver", "copy-on-read");
     qdict_put_str(opts, "file", bdrv_get_node_name(bs));
+    if (base_name) {
+        qdict_put_str(opts, "base", base_name);
+    }
     if (filter_node_name) {
         qdict_put_str(opts, "node-name", filter_node_name);
     }
@@ -206,13 +244,14 @@ static BlockDriverState *create_filter_node(BlockDriverState *bs,
 
 
 BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         const char *base_name,
                                          const char *filter_node_name,
                                          Error **errp)
 {
     BlockDriverState *cor_filter_bs;
     Error *local_err = NULL;
 
-    cor_filter_bs = create_filter_node(bs, filter_node_name, errp);
+    cor_filter_bs = create_filter_node(bs, base_name, filter_node_name, errp);
     if (cor_filter_bs == NULL) {
         error_prepend(errp, "Could not create filter node: ");
         return NULL;
diff --git a/block/copy-on-read.h b/block/copy-on-read.h
index 1686e4e..6a7c8bb 100644
--- a/block/copy-on-read.h
+++ b/block/copy-on-read.h
@@ -28,6 +28,7 @@
 #include "block/block_int.h"
 
 BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
+                                         const char *base_name,
                                          const char *filter_node_name,
                                          Error **errp);
 void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 5/7] copy-on-read: limit guest writes to base in COR driver
  2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (3 preceding siblings ...)
  2020-08-28 16:52 ` [PATCH v8 4/7] copy-on-read: pass base file name to COR driver Andrey Shinkevich via
@ 2020-08-28 16:52 ` Andrey Shinkevich via
  2020-09-04 12:50   ` Max Reitz
  2020-08-28 16:52 ` [PATCH v8 6/7] block-stream: freeze link to base node during stream job Andrey Shinkevich via
  2020-08-28 16:52 ` [PATCH v8 7/7] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
  6 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

Limit the guest's COR operations by the base node in the backing chain
during a stream job.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/copy-on-read.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 1f858bb..ecbd1f8 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -57,6 +57,37 @@ static BlockDriverState *get_base_by_name(BlockDriverState *bs,
     return base_bs;
 }
 
+/*
+ * Returns 1 if the block is allocated in a node between cor_filter_bs->file->bs
+ * and the base_bs in the backing chain. Otherwise, returns 0.
+ * The COR operation is allowed if the base_bs is not specified - return 1.
+ * Returns negative errno on failure.
+ */
+static int node_above_base(BlockDriverState *cor_filter_bs, uint64_t offset,
+                           uint64_t bytes)
+{
+    int ret;
+    int64_t dummy;
+    BlockDriverState *file = NULL;
+    BDRVStateCOR *state = cor_filter_bs->opaque;
+
+    if (!state->base_bs) {
+        return 1;
+    }
+
+    ret = bdrv_block_status_above(cor_filter_bs->file->bs, state->base_bs,
+                                  offset, bytes, &dummy, NULL, &file);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (file) {
+        return 1;
+    }
+
+    return 0;
+}
+
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
@@ -153,6 +184,12 @@ static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
                                             QEMUIOVector *qiov,
                                             size_t qiov_offset, int flags)
 {
+    int ret = node_above_base(bs, offset, bytes);
+
+    if (!ret || ret < 0) {
+        return ret;
+    }
+
     return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
                                 flags);
 }
@@ -162,6 +199,12 @@ static int coroutine_fn cor_co_pwrite_zeroes(BlockDriverState *bs,
                                              int64_t offset, int bytes,
                                              BdrvRequestFlags flags)
 {
+    int ret = node_above_base(bs, offset, bytes);
+
+    if (!ret || ret < 0) {
+        return ret;
+    }
+
     return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
 }
 
@@ -178,6 +221,12 @@ static int coroutine_fn cor_co_pwritev_compressed(BlockDriverState *bs,
                                                   uint64_t bytes,
                                                   QEMUIOVector *qiov)
 {
+    int ret = node_above_base(bs, offset, bytes);
+
+    if (!ret || ret < 0) {
+        return ret;
+    }
+
     return bdrv_co_pwritev(bs->file, offset, bytes, qiov,
                            BDRV_REQ_WRITE_COMPRESSED);
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 6/7] block-stream: freeze link to base node during stream job
  2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (4 preceding siblings ...)
  2020-08-28 16:52 ` [PATCH v8 5/7] copy-on-read: limit guest writes to base in " Andrey Shinkevich via
@ 2020-08-28 16:52 ` Andrey Shinkevich via
  2020-09-04 13:21   ` Max Reitz
  2020-08-28 16:52 ` [PATCH v8 7/7] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
  6 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

To keep the base node unchanged during the block-stream operation,
freeze it as the other part of the backing chain with the intermediate
nodes related to the job.
This patch revers the change made with the commit c624b015bf as the
correct base file name and its format have to be written down to the
QCOW2 header on the disk when the backing file is being changed after
the stream job completes.
This reversion incurs changes in the tests 030, 245 and discards the
test 258 where concurrent stream/commit jobs are tested. When the link
to a base node is frozen, the concurrent job cannot change the common
backing chain.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/stream.c             |  29 ++------
 tests/qemu-iotests/030     |  10 +--
 tests/qemu-iotests/245     |   2 +-
 tests/qemu-iotests/258     | 161 ---------------------------------------------
 tests/qemu-iotests/258.out |  33 ----------
 5 files changed, 14 insertions(+), 221 deletions(-)
 delete mode 100755 tests/qemu-iotests/258
 delete mode 100644 tests/qemu-iotests/258.out

diff --git a/block/stream.c b/block/stream.c
index 8bf6b6d..fee4117 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -32,7 +32,7 @@ enum {
 typedef struct StreamBlockJob {
     BlockJob common;
     BlockDriverState *base_overlay; /* COW overlay (stream from this) */
-    BlockDriverState *above_base;   /* Node directly above the base */
+    BlockDriverState *base;   /* The base node */
     BlockdevOnError on_error;
     char *backing_file_str;
     bool bs_read_only;
@@ -54,7 +54,7 @@ static void stream_abort(Job *job)
 
     if (s->chain_frozen) {
         BlockJob *bjob = &s->common;
-        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->above_base);
+        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->base);
     }
 }
 
@@ -64,11 +64,11 @@ static int stream_prepare(Job *job)
     BlockJob *bjob = &s->common;
     BlockDriverState *bs = blk_bs(bjob->blk);
     BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
-    BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
+    BlockDriverState *base = s->base;
     Error *local_err = NULL;
     int ret = 0;
 
-    bdrv_unfreeze_backing_chain(bs, s->above_base);
+    bdrv_unfreeze_backing_chain(bs, s->base);
     s->chain_frozen = false;
 
     if (bdrv_cow_child(unfiltered_bs)) {
@@ -230,7 +230,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     bool bs_read_only;
     int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
     BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
-    BlockDriverState *above_base;
 
     if (!base_overlay) {
         error_setg(errp, "'%s' is not in the backing chain of '%s'",
@@ -238,20 +237,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
         return;
     }
 
-    /*
-     * Find the node directly above @base.  @base_overlay is a COW overlay, so
-     * it must have a bdrv_cow_child(), but it is the immediate overlay of
-     * @base, so between the two there can only be filters.
-     */
-    above_base = base_overlay;
-    if (bdrv_cow_bs(above_base) != base) {
-        above_base = bdrv_cow_bs(above_base);
-        while (bdrv_filter_bs(above_base) != base) {
-            above_base = bdrv_filter_bs(above_base);
-        }
-    }
-
-    if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
+    if (bdrv_freeze_backing_chain(bs, base, errp) < 0) {
         return;
     }
 
@@ -284,7 +270,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
      * above_base node might change after the call to
      * bdrv_reopen_set_read_only() due to parallel block jobs running.
      */
-    base = bdrv_filter_or_cow_bs(above_base);
     for (iter = bdrv_filter_or_cow_bs(bs); iter != base;
          iter = bdrv_filter_or_cow_bs(iter))
     {
@@ -293,7 +278,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     }
 
     s->base_overlay = base_overlay;
-    s->above_base = above_base;
+    s->base = base;
     s->backing_file_str = g_strdup(backing_file_str);
     s->bs_read_only = bs_read_only;
     s->chain_frozen = true;
@@ -307,5 +292,5 @@ fail:
     if (bs_read_only) {
         bdrv_reopen_set_read_only(bs, true, NULL);
     }
-    bdrv_unfreeze_backing_chain(bs, above_base);
+    bdrv_unfreeze_backing_chain(bs, base);
 }
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 1cdd7e2..c565e76 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -382,7 +382,7 @@ class TestParallelOps(iotests.QMPTestCase):
         # Stream from node2 into node4
         result = self.vm.qmp('block-stream', device='node4', base_node='node2', job_id='node4')
         self.assert_qmp(result, 'error/desc',
-            "Cannot freeze 'backing' link to 'commit-filter'")
+            "Cannot change 'backing' link from 'commit-filter' to 'node2'")
 
         result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
         self.assert_qmp(result, 'return', {})
@@ -406,13 +406,15 @@ class TestParallelOps(iotests.QMPTestCase):
         # Stream from node2 into node4
         result = self.vm.qmp('block-stream', device='node4',
                              base_node='commit-filter', job_id='node4')
-        self.assert_qmp(result, 'return', {})
+        self.assert_qmp(result, 'error/desc',
+                        "Cannot freeze 'backing' link to 'commit-filter'")
 
         result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
         self.assert_qmp(result, 'return', {})
 
-        self.vm.run_job(job='drive0', auto_dismiss=True)
-        self.vm.run_job(job='node4', auto_dismiss=True)
+        self.wait_until_completed()
+        #self.vm.run_job(job='drive0', auto_dismiss=True)
+        #self.vm.run_job(job='node4', auto_dismiss=True)
         self.assert_no_active_block_jobs()
 
     # Test a block-stream and a block-commit job in parallel
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 5035763..b9399ef 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -872,7 +872,7 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 
         # We can remove hd2 while the stream job is ongoing
         opts['backing']['backing'] = None
-        self.reopen(opts, {})
+        self.reopen(opts, {}, "Cannot change 'backing' link from 'hd1' to 'hd2'")
 
         # We can't remove hd1 while the stream job is ongoing
         opts['backing'] = None
diff --git a/tests/qemu-iotests/258 b/tests/qemu-iotests/258
deleted file mode 100755
index e305a15..0000000
--- a/tests/qemu-iotests/258
+++ /dev/null
@@ -1,161 +0,0 @@
-#!/usr/bin/env python3
-#
-# Very specific tests for adjacent commit/stream block jobs
-#
-# Copyright (C) 2019 Red Hat, Inc.
-#
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU General Public License as published by
-# the Free Software Foundation; either version 2 of the License, or
-# (at your option) any later version.
-#
-# This program is distributed in the hope that it will be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program.  If not, see <http://www.gnu.org/licenses/>.
-#
-# Creator/Owner: Max Reitz <mreitz@redhat.com>
-
-import iotests
-from iotests import log, qemu_img, qemu_io_silent, \
-        filter_qmp_testfiles, filter_qmp_imgfmt
-
-# Returns a node for blockdev-add
-def node(node_name, path, backing=None, fmt=None, throttle=None):
-    if fmt is None:
-        fmt = iotests.imgfmt
-
-    res = {
-        'node-name': node_name,
-        'driver': fmt,
-        'file': {
-            'driver': 'file',
-            'filename': path
-        }
-    }
-
-    if backing is not None:
-        res['backing'] = backing
-
-    if throttle:
-        res['file'] = {
-            'driver': 'throttle',
-            'throttle-group': throttle,
-            'file': res['file']
-        }
-
-    return res
-
-# Finds a node in the debug block graph
-def find_graph_node(graph, node_id):
-    return next(node for node in graph['nodes'] if node['id'] == node_id)
-
-
-def test_concurrent_finish(write_to_stream_node):
-    log('')
-    log('=== Commit and stream finish concurrently (letting %s write) ===' % \
-        ('stream' if write_to_stream_node else 'commit'))
-    log('')
-
-    # All chosen in such a way that when the commit job wants to
-    # finish, it polls and thus makes stream finish concurrently --
-    # and the other way around, depending on whether the commit job
-    # is finalized before stream completes or not.
-
-    with iotests.FilePath('node4.img') as node4_path, \
-         iotests.FilePath('node3.img') as node3_path, \
-         iotests.FilePath('node2.img') as node2_path, \
-         iotests.FilePath('node1.img') as node1_path, \
-         iotests.FilePath('node0.img') as node0_path, \
-         iotests.VM() as vm:
-
-        # It is important to use raw for the base layer (so that
-        # permissions are just handed through to the protocol layer)
-        assert qemu_img('create', '-f', 'raw', node0_path, '64M') == 0
-
-        stream_throttle=None
-        commit_throttle=None
-
-        for path in [node1_path, node2_path, node3_path, node4_path]:
-            assert qemu_img('create', '-f', iotests.imgfmt, path, '64M') == 0
-
-        if write_to_stream_node:
-            # This is what (most of the time) makes commit finish
-            # earlier and then pull in stream
-            assert qemu_io_silent(node2_path,
-                                  '-c', 'write %iK 64K' % (65536 - 192),
-                                  '-c', 'write %iK 64K' % (65536 -  64)) == 0
-
-            stream_throttle='tg'
-        else:
-            # And this makes stream finish earlier
-            assert qemu_io_silent(node1_path,
-                                  '-c', 'write %iK 64K' % (65536 - 64)) == 0
-
-            commit_throttle='tg'
-
-        vm.launch()
-
-        vm.qmp_log('object-add',
-                   qom_type='throttle-group',
-                   id='tg',
-                   props={
-                       'x-iops-write': 1,
-                       'x-iops-write-max': 1
-                   })
-
-        vm.qmp_log('blockdev-add',
-                   filters=[filter_qmp_testfiles, filter_qmp_imgfmt],
-                   **node('node4', node4_path, throttle=stream_throttle,
-                     backing=node('node3', node3_path,
-                     backing=node('node2', node2_path,
-                     backing=node('node1', node1_path,
-                     backing=node('node0', node0_path, throttle=commit_throttle,
-                                  fmt='raw'))))))
-
-        vm.qmp_log('block-commit',
-                   job_id='commit',
-                   device='node4',
-                   filter_node_name='commit-filter',
-                   top_node='node1',
-                   base_node='node0',
-                   auto_finalize=False)
-
-        vm.qmp_log('block-stream',
-                   job_id='stream',
-                   device='node3',
-                   base_node='commit-filter')
-
-        if write_to_stream_node:
-            vm.run_job('commit', auto_finalize=False, auto_dismiss=True)
-            vm.run_job('stream', auto_finalize=True, auto_dismiss=True)
-        else:
-            # No, the jobs do not really finish concurrently here,
-            # the stream job does complete strictly before commit.
-            # But still, this is close enough for what we want to
-            # test.
-            vm.run_job('stream', auto_finalize=True, auto_dismiss=True)
-            vm.run_job('commit', auto_finalize=False, auto_dismiss=True)
-
-        # Assert that the backing node of node3 is node 0 now
-        graph = vm.qmp('x-debug-query-block-graph')['return']
-        for edge in graph['edges']:
-            if edge['name'] == 'backing' and \
-               find_graph_node(graph, edge['parent'])['name'] == 'node3':
-                assert find_graph_node(graph, edge['child'])['name'] == 'node0'
-                break
-
-
-def main():
-    log('Running tests:')
-    test_concurrent_finish(True)
-    test_concurrent_finish(False)
-
-if __name__ == '__main__':
-    # Need backing file and change-backing-file support
-    iotests.script_main(main,
-                        supported_fmts=['qcow2', 'qed'],
-                        supported_platforms=['linux'])
diff --git a/tests/qemu-iotests/258.out b/tests/qemu-iotests/258.out
deleted file mode 100644
index ce6e9ba..0000000
--- a/tests/qemu-iotests/258.out
+++ /dev/null
@@ -1,33 +0,0 @@
-Running tests:
-
-=== Commit and stream finish concurrently (letting stream write) ===
-
-{"execute": "object-add", "arguments": {"id": "tg", "props": {"x-iops-write": 1, "x-iops-write-max": 1}, "qom-type": "throttle-group"}}
-{"return": {}}
-{"execute": "blockdev-add", "arguments": {"backing": {"backing": {"backing": {"backing": {"driver": "raw", "file": {"driver": "file", "filename": "TEST_DIR/PID-node0.img"}, "node-name": "node0"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node1.img"}, "node-name": "node1"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node2.img"}, "node-name": "node2"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node3.img"}, "node-name": "node3"}, "driver": "IMGFMT", "file": {"driver": "throttle", "file": {"driver": "file", "filename": "TEST_DIR/PID-node4.img"}, "throttle-group": "tg"}, "node-name": "node4"}}
-{"return": {}}
-{"execute": "block-commit", "arguments": {"auto-finalize": false, "base-node": "node0", "device": "node4", "filter-node-name": "commit-filter", "job-id": "commit", "top-node": "node1"}}
-{"return": {}}
-{"execute": "block-stream", "arguments": {"base-node": "commit-filter", "device": "node3", "job-id": "stream"}}
-{"return": {}}
-{"execute": "job-finalize", "arguments": {"id": "commit"}}
-{"return": {}}
-{"data": {"id": "commit", "type": "commit"}, "event": "BLOCK_JOB_PENDING", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "commit", "len": 67108864, "offset": 67108864, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "stream", "len": 67108864, "offset": 67108864, "speed": 0, "type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-
-=== Commit and stream finish concurrently (letting commit write) ===
-
-{"execute": "object-add", "arguments": {"id": "tg", "props": {"x-iops-write": 1, "x-iops-write-max": 1}, "qom-type": "throttle-group"}}
-{"return": {}}
-{"execute": "blockdev-add", "arguments": {"backing": {"backing": {"backing": {"backing": {"driver": "raw", "file": {"driver": "throttle", "file": {"driver": "file", "filename": "TEST_DIR/PID-node0.img"}, "throttle-group": "tg"}, "node-name": "node0"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node1.img"}, "node-name": "node1"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node2.img"}, "node-name": "node2"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node3.img"}, "node-name": "node3"}, "driver": "IMGFMT", "file": {"driver": "file", "filename": "TEST_DIR/PID-node4.img"}, "node-name": "node4"}}
-{"return": {}}
-{"execute": "block-commit", "arguments": {"auto-finalize": false, "base-node": "node0", "device": "node4", "filter-node-name": "commit-filter", "job-id": "commit", "top-node": "node1"}}
-{"return": {}}
-{"execute": "block-stream", "arguments": {"base-node": "commit-filter", "device": "node3", "job-id": "stream"}}
-{"return": {}}
-{"data": {"device": "stream", "len": 67108864, "offset": 67108864, "speed": 0, "type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-{"execute": "job-finalize", "arguments": {"id": "commit"}}
-{"return": {}}
-{"data": {"id": "commit", "type": "commit"}, "event": "BLOCK_JOB_PENDING", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "commit", "len": 67108864, "offset": 67108864, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 7/7] block: apply COR-filter to block-stream jobs
  2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
                   ` (5 preceding siblings ...)
  2020-08-28 16:52 ` [PATCH v8 6/7] block-stream: freeze link to base node during stream job Andrey Shinkevich via
@ 2020-08-28 16:52 ` Andrey Shinkevich via
  2020-09-04 13:41   ` Max Reitz
  6 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich via @ 2020-08-28 16:52 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, kwolf, mreitz, armbru, jsnow, eblake, den,
	vsementsov, andrey.shinkevich

This patch completes the series with the COR-filter insertion for
block-stream operations. Adding the filter makes it possible for copied
regions to be discarded in backing files during the block-stream job,
what will reduce the disk overuse.
The COR-filter insertion incurs changes in the iotests case
245:test_block_stream_4 that reopens the backing chain during a
block-stream job. There are changes in the iotests #030 as well.
The iotests case 030:test_stream_parallel was deleted due to multiple
conflicts between the concurrent job operations over the same backing
chain. The base backing node for one job is the top node for another
job. It may change due to the filter node inserted into the backing
chain while both jobs are running. Another issue is that the parts of
the backing chain are being frozen by the running job and may not be
changed by the concurrent job when needed. The concept of the parallel
jobs with common nodes is considered vital no more.

Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
---
 block/stream.c             | 78 +++++++++++++++++++++++++++-------------------
 tests/qemu-iotests/030     | 50 +++--------------------------
 tests/qemu-iotests/030.out |  4 +--
 tests/qemu-iotests/245     | 19 ++++++++---
 4 files changed, 67 insertions(+), 84 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index fee4117..ab8ba39 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -19,6 +19,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/ratelimit.h"
 #include "sysemu/block-backend.h"
+#include "block/copy-on-read.h"
 
 enum {
     /*
@@ -33,6 +34,8 @@ typedef struct StreamBlockJob {
     BlockJob common;
     BlockDriverState *base_overlay; /* COW overlay (stream from this) */
     BlockDriverState *base;   /* The base node */
+    BlockDriverState *cor_filter_bs;
+    BlockDriverState *target_bs;
     BlockdevOnError on_error;
     char *backing_file_str;
     bool bs_read_only;
@@ -53,22 +56,19 @@ static void stream_abort(Job *job)
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 
     if (s->chain_frozen) {
-        BlockJob *bjob = &s->common;
-        bdrv_unfreeze_backing_chain(blk_bs(bjob->blk), s->base);
+        bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->base);
     }
 }
 
 static int stream_prepare(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
-    BlockJob *bjob = &s->common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
-    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
     BlockDriverState *base = s->base;
     Error *local_err = NULL;
     int ret = 0;
 
-    bdrv_unfreeze_backing_chain(bs, s->base);
+    bdrv_unfreeze_backing_chain(s->cor_filter_bs, s->base);
     s->chain_frozen = false;
 
     if (bdrv_cow_child(unfiltered_bs)) {
@@ -94,13 +94,14 @@ static void stream_clean(Job *job)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockJob *bjob = &s->common;
-    BlockDriverState *bs = blk_bs(bjob->blk);
+
+    bdrv_cor_filter_drop(s->cor_filter_bs);
 
     /* Reopen the image back in read-only mode if necessary */
     if (s->bs_read_only) {
         /* Give up write permissions before making it read-only */
         blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
-        bdrv_reopen_set_read_only(bs, true, NULL);
+        bdrv_reopen_set_read_only(s->target_bs, true, NULL);
     }
 
     g_free(s->backing_file_str);
@@ -110,9 +111,7 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
 {
     StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
     BlockBackend *blk = s->common.blk;
-    BlockDriverState *bs = blk_bs(blk);
-    BlockDriverState *unfiltered_bs = bdrv_skip_filters(bs);
-    bool enable_cor = !bdrv_cow_child(s->base_overlay);
+    BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
     int64_t len;
     int64_t offset = 0;
     uint64_t delay_ns = 0;
@@ -124,21 +123,12 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
         return 0;
     }
 
-    len = bdrv_getlength(bs);
+    len = bdrv_getlength(s->target_bs);
     if (len < 0) {
         return len;
     }
     job_progress_set_remaining(&s->common.job, len);
 
-    /* Turn on copy-on-read for the whole block device so that guest read
-     * requests help us make progress.  Only do this when copying the entire
-     * backing chain since the copy-on-read operation does not take base into
-     * account.
-     */
-    if (enable_cor) {
-        bdrv_enable_copy_on_read(bs);
-    }
-
     for ( ; offset < len; offset += n) {
         bool copy;
         int ret;
@@ -197,10 +187,6 @@ static int coroutine_fn stream_run(Job *job, Error **errp)
         }
     }
 
-    if (enable_cor) {
-        bdrv_disable_copy_on_read(bs);
-    }
-
     /* Do not remove the backing file if an error was there but ignored. */
     return error;
 }
@@ -230,6 +216,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
     bool bs_read_only;
     int basic_flags = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED;
     BlockDriverState *base_overlay = bdrv_find_overlay(bs, base);
+    BlockDriverState *cor_filter_bs = NULL;
 
     if (!base_overlay) {
         error_setg(errp, "'%s' is not in the backing chain of '%s'",
@@ -250,17 +237,37 @@ void stream_start(const char *job_id, BlockDriverState *bs,
         }
     }
 
-    /* Prevent concurrent jobs trying to modify the graph structure here, we
-     * already have our own plans. Also don't allow resize as the image size is
-     * queried only at the job start and then cached. */
-    s = block_job_create(job_id, &stream_job_driver, NULL, bs,
-                         basic_flags | BLK_PERM_GRAPH_MOD,
-                         basic_flags | BLK_PERM_WRITE,
+    cor_filter_bs = bdrv_cor_filter_append(bs, backing_file_str,
+                                           filter_node_name, errp);
+    if (cor_filter_bs == NULL) {
+        goto fail;
+    }
+
+    if (bdrv_freeze_backing_chain(cor_filter_bs, bs, errp) < 0) {
+        bdrv_cor_filter_drop(cor_filter_bs);
+        cor_filter_bs = NULL;
+        goto fail;
+    }
+
+    s = block_job_create(job_id, &stream_job_driver, NULL, cor_filter_bs,
+                         BLK_PERM_CONSISTENT_READ,
+                         basic_flags | BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD,
                          speed, creation_flags, NULL, NULL, errp);
     if (!s) {
         goto fail;
     }
 
+    /*
+     * Prevent concurrent jobs trying to modify the graph structure here, we
+     * already have our own plans. Also don't allow resize as the image size is
+     * queried only at the job start and then cached.
+     */
+    if (block_job_add_bdrv(&s->common, "active node", bs,
+                           basic_flags | BLK_PERM_GRAPH_MOD,
+                           basic_flags | BLK_PERM_WRITE, &error_abort)) {
+        goto fail;
+    }
+
     /* Block all intermediate nodes between bs and base, because they will
      * disappear from the chain after this operation. The streaming job reads
      * every block only once, assuming that it doesn't change, so forbid writes
@@ -279,6 +286,8 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 
     s->base_overlay = base_overlay;
     s->base = base;
+    s->cor_filter_bs = cor_filter_bs;
+    s->target_bs = bs;
     s->backing_file_str = g_strdup(backing_file_str);
     s->bs_read_only = bs_read_only;
     s->chain_frozen = true;
@@ -292,5 +301,10 @@ fail:
     if (bs_read_only) {
         bdrv_reopen_set_read_only(bs, true, NULL);
     }
-    bdrv_unfreeze_backing_chain(bs, base);
+    if (cor_filter_bs) {
+        bdrv_unfreeze_backing_chain(cor_filter_bs, base);
+        bdrv_cor_filter_drop(cor_filter_bs);
+    } else {
+        bdrv_unfreeze_backing_chain(bs, base);
+    }
 }
diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index c565e76..d651779 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -221,60 +221,20 @@ class TestParallelOps(iotests.QMPTestCase):
         for img in self.imgs:
             os.remove(img)
 
-    # Test that it's possible to run several block-stream operations
-    # in parallel in the same snapshot chain
-    def test_stream_parallel(self):
-        self.assert_no_active_block_jobs()
-
-        # Check that the maps don't match before the streaming operations
-        for i in range(2, self.num_imgs, 2):
-            self.assertNotEqual(qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.imgs[i]),
-                                qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', self.imgs[i-1]),
-                                'image file map matches backing file before streaming')
-
-        # Create all streaming jobs
-        pending_jobs = []
-        for i in range(2, self.num_imgs, 2):
-            node_name = 'node%d' % i
-            job_id = 'stream-%s' % node_name
-            pending_jobs.append(job_id)
-            result = self.vm.qmp('block-stream', device=node_name, job_id=job_id, base=self.imgs[i-2], speed=512*1024)
-            self.assert_qmp(result, 'return', {})
-
-        for job in pending_jobs:
-            result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
-            self.assert_qmp(result, 'return', {})
-
-        # Wait for all jobs to be finished.
-        while len(pending_jobs) > 0:
-            for event in self.vm.get_qmp_events(wait=True):
-                if event['event'] == 'BLOCK_JOB_COMPLETED':
-                    job_id = self.dictpath(event, 'data/device')
-                    self.assertTrue(job_id in pending_jobs)
-                    self.assert_qmp_absent(event, 'data/error')
-                    pending_jobs.remove(job_id)
-
-        self.assert_no_active_block_jobs()
-        self.vm.shutdown()
-
-        # Check that all maps match now
-        for i in range(2, self.num_imgs, 2):
-            self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', self.imgs[i]),
-                             qemu_io('-f', iotests.imgfmt, '-c', 'map', self.imgs[i-1]),
-                             'image file map does not match backing file after streaming')
-
     # Test that it's not possible to perform two block-stream
     # operations if there are nodes involved in both.
     def test_overlapping_1(self):
         self.assert_no_active_block_jobs()
 
         # Set a speed limit to make sure that this job blocks the rest
-        result = self.vm.qmp('block-stream', device='node4', job_id='stream-node4', base=self.imgs[1], speed=1024*1024)
+        result = self.vm.qmp('block-stream', device='node4',
+                             job_id='stream-node4', base=self.imgs[1],
+                             filter_node_name='stream-filter', speed=1024*1024)
         self.assert_qmp(result, 'return', {})
 
         result = self.vm.qmp('block-stream', device='node5', job_id='stream-node5', base=self.imgs[2])
         self.assert_qmp(result, 'error/desc',
-            "Node 'node4' is busy: block device is in use by block job: stream")
+            "Node 'stream-filter' is busy: block device is in use by block job: stream")
 
         result = self.vm.qmp('block-stream', device='node3', job_id='stream-node3', base=self.imgs[2])
         self.assert_qmp(result, 'error/desc',
@@ -287,7 +247,7 @@ class TestParallelOps(iotests.QMPTestCase):
         # block-commit should also fail if it touches nodes used by the stream job
         result = self.vm.qmp('block-commit', device='drive0', base=self.imgs[4], job_id='commit-node4')
         self.assert_qmp(result, 'error/desc',
-            "Node 'node4' is busy: block device is in use by block job: stream")
+            "Node 'stream-filter' is busy: block device is in use by block job: stream")
 
         result = self.vm.qmp('block-commit', device='drive0', base=self.imgs[1], top=self.imgs[3], job_id='commit-node1')
         self.assert_qmp(result, 'error/desc',
diff --git a/tests/qemu-iotests/030.out b/tests/qemu-iotests/030.out
index 6d9bee1..5eb508d 100644
--- a/tests/qemu-iotests/030.out
+++ b/tests/qemu-iotests/030.out
@@ -1,5 +1,5 @@
-...........................
+..........................
 ----------------------------------------------------------------------
-Ran 27 tests
+Ran 26 tests
 
 OK
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index b9399ef..c756cfe 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -898,17 +898,26 @@ class TestBlockdevReopen(iotests.QMPTestCase):
         # make hd1 read-only and block-stream requires it to be read-write
         # (Which error message appears depends on whether the stream job is
         # already done with copying at this point.)
-        self.reopen(opts, {},
+        # As the COR-filter node is inserted into the backing chain with the
+        # 'block-stream' operation, we move the options to their proper nodes.
+        opts = hd_opts(1)
+        opts['backing'] = hd_opts(2)
+        opts['backing']['backing'] = None
+        self.reopen(opts, {'read-only': True},
             ["Can't set node 'hd1' to r/o with copy-on-read enabled",
              "Cannot make block node read-only, there is a writer on it"])
 
         # We can't remove hd2 while the stream job is ongoing
-        opts['backing']['backing'] = None
-        self.reopen(opts, {'backing.read-only': False}, "Cannot change 'backing' link from 'hd1' to 'hd2'")
+        opts['backing'] = None
+        self.reopen(opts, {'read-only': False},
+                    "Cannot change 'backing' link from 'hd1' to 'hd2'")
 
-        # We can detach hd1 from hd0 because it doesn't affect the stream job
+        # We can't detach hd1 from hd0 because there is the COR-filter implicit
+        # node in between.
+        opts = hd_opts(0)
         opts['backing'] = None
-        self.reopen(opts)
+        self.reopen(opts, {},
+                    "Cannot change backing link if 'hd0' has an implicit backing file")
 
         self.vm.run_job('stream0', auto_finalize = False, auto_dismiss = True)
 
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-08-28 16:52 ` [PATCH v8 2/7] copy-on-read: add filter append/drop functions Andrey Shinkevich via
@ 2020-09-04 11:22   ` Max Reitz
  2020-09-17 16:09     ` Andrey Shinkevich
  0 siblings, 1 reply; 29+ messages in thread
From: Max Reitz @ 2020-09-04 11:22 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, vsementsov, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 3900 bytes --]

On 28.08.20 18:52, Andrey Shinkevich wrote:
> Provide API for the COR-filter insertion/removal.

Hm.  Why?

I see the implementation is just bdrv_open() + bdrv_replace_node(),
which is what I would have expected.  Why can’t the caller just do that?

Or maybe it would even make sense to just make it a generic block driver
function, like

BlockDriverState *
bdrv_insert_node(BlockDriverState *bs, const char *node_driver,
                 const char *node_name, QDict *node_options,
                 Error **errp);

?

(Similarly for dropping a node from a chain.)

> Also, drop the filter child permissions for an inactive state when the
> filter node is being removed.

Do we need .active for that?  Shouldn’t it be sufficient to just not
require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
node (i.e. perm == 0 in cor_child_perm())?

Of course, using an .active field works, too.  But Vladimir wanted a
similar field in the preallocate filter, and there already is in
backup-top.  I feel like there shouldn’t be a need for this.

.bdrv_child_perm() should generally be able to decide what permissions
it needs based on the information it gets.  If every driver needs to
keep track of whether it has any parents and feed that information into
.bdrv_child_perm() as external state, then something feels wrong.

If perm == 0 is not sufficient to rule out any parents, we should just
explicitly add a boolean that tells .bdrv_child_perm() whether there are
any parents or not – shouldn’t be too difficult.

> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/copy-on-read.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/copy-on-read.h |  35 +++++++++++++++++
>  2 files changed, 139 insertions(+)
>  create mode 100644 block/copy-on-read.h

[...]

>  block_init(bdrv_copy_on_read_init);
> diff --git a/block/copy-on-read.h b/block/copy-on-read.h
> new file mode 100644
> index 0000000..1686e4e
> --- /dev/null
> +++ b/block/copy-on-read.h
> @@ -0,0 +1,35 @@
> +/*
> + * Copy-on-read filter block driver
> + *
> + * The filter driver performs Copy-On-Read (COR) operations
> + *
> + * Copyright (c) 2018-2020 Virtuozzo International GmbH.
> + *
> + * Author:
> + *   Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef COPY_ON_READ_FILTER
> +#define COPY_ON_READ_FILTER

Bit of a weird include guard, seeing that most header files in qemu
(certainly under block/) base their name on their filename.  So this
would be BLOCK_COPY_ON_READ_H (or COPY_ON_READ_H).

(It’s just that COPY_ON_READ_FILTER kind of sounds like a configured
option, i.e. whether the filter is present or not.)

Max

> +
> +#include "block/block_int.h"
> +
> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
> +                                         const char *filter_node_name,
> +                                         Error **errp);
> +void bdrv_cor_filter_drop(BlockDriverState *cor_filter_bs);
> +
> +#endif /* COPY_ON_READ_FILTER */
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 4/7] copy-on-read: pass base file name to COR driver
  2020-08-28 16:52 ` [PATCH v8 4/7] copy-on-read: pass base file name to COR driver Andrey Shinkevich via
@ 2020-09-04 12:17   ` Max Reitz
  2020-09-04 12:26     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Max Reitz @ 2020-09-04 12:17 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, vsementsov, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 2640 bytes --]

On 28.08.20 18:52, Andrey Shinkevich wrote:
> To limit the guest's COR operations by the base node in the backing
> chain during stream job, pass the base file name to the copy-on-read

Does it have to be a filename?  That sounds really bad to me.

> driver. The rest of the functionality will be implemented in the patch
> that follows.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  block/copy-on-read.h |  1 +
>  2 files changed, 41 insertions(+), 1 deletion(-)

Furthermore, I believe that this option should become an externally
visible option for the copy-on-read filter (i.e., part of its
BlockdevOptions) – but that definitely won’t be viable if @base contains
a filename.

Can’t we let the stream job invoke bdrv_find_backing_image() to
translate a filename into a node name that’s then passed to the COR filter?

> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index 0ede7aa..1f858bb 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -24,19 +24,45 @@
>  #include "block/block_int.h"
>  #include "qemu/module.h"
>  #include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
>  #include "qapi/qmp/qdict.h"
>  #include "block/copy-on-read.h"
>  
>  
>  typedef struct BDRVStateCOR {
>      bool active;
> +    BlockDriverState *base_bs;
>  } BDRVStateCOR;
>  
> +/*
> + * Non-zero pointers are the caller's responsibility.
> + */
> +static BlockDriverState *get_base_by_name(BlockDriverState *bs,
> +                                          const char *base_name, Error **errp)
> +{
> +    BlockDriverState *base_bs = NULL;
> +    AioContext *aio_context;
> +
> +    base_bs = bdrv_find_backing_image(bs, base_name);
> +    if (base_bs == NULL) {
> +        error_setg(errp, QERR_BASE_NOT_FOUND, base_name);
> +        return NULL;
> +    }
> +
> +    aio_context = bdrv_get_aio_context(bs);
> +    aio_context_acquire(aio_context);
> +    assert(bdrv_get_aio_context(base_bs) == aio_context);
> +    aio_context_release(aio_context);

Er.  OK.  But why?  Isn’t this just guaranteed by the block layer?  I
don’t think we need an explicit assertion for this, especially if it
means having to acquire an AioContext.

Furthermore, I don’t even know why we’d need the AioContext.  On one
hand, we don’t need to acquire a context just to get it or compare it;
on the other, this I would have thought that .bdrv_open() runs in the
BDS’s AioContext anyway (or the caller already has it acquired at least).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 4/7] copy-on-read: pass base file name to COR driver
  2020-09-04 12:17   ` Max Reitz
@ 2020-09-04 12:26     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-04 12:26 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

04.09.2020 15:17, Max Reitz wrote:
> On 28.08.20 18:52, Andrey Shinkevich wrote:
>> To limit the guest's COR operations by the base node in the backing
>> chain during stream job, pass the base file name to the copy-on-read
> 
> Does it have to be a filename?  That sounds really bad to me.

Agree, it should be node-name. Filename-based interfaces is a headache.

> 
>> driver. The rest of the functionality will be implemented in the patch
>> that follows.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>>   block/copy-on-read.h |  1 +
>>   2 files changed, 41 insertions(+), 1 deletion(-)
> 
> Furthermore, I believe that this option should become an externally
> visible option for the copy-on-read filter (i.e., part of its
> BlockdevOptions) – but that definitely won’t be viable if @base contains
> a filename.
> 
> Can’t we let the stream job invoke bdrv_find_backing_image() to
> translate a filename into a node name that’s then passed to the COR filter?
> 
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index 0ede7aa..1f858bb 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -24,19 +24,45 @@
>>   #include "block/block_int.h"
>>   #include "qemu/module.h"
>>   #include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>>   #include "qapi/qmp/qdict.h"
>>   #include "block/copy-on-read.h"
>>   
>>   
>>   typedef struct BDRVStateCOR {
>>       bool active;
>> +    BlockDriverState *base_bs;
>>   } BDRVStateCOR;
>>   
>> +/*
>> + * Non-zero pointers are the caller's responsibility.
>> + */
>> +static BlockDriverState *get_base_by_name(BlockDriverState *bs,
>> +                                          const char *base_name, Error **errp)
>> +{
>> +    BlockDriverState *base_bs = NULL;
>> +    AioContext *aio_context;
>> +
>> +    base_bs = bdrv_find_backing_image(bs, base_name);
>> +    if (base_bs == NULL) {
>> +        error_setg(errp, QERR_BASE_NOT_FOUND, base_name);
>> +        return NULL;
>> +    }
>> +
>> +    aio_context = bdrv_get_aio_context(bs);
>> +    aio_context_acquire(aio_context);
>> +    assert(bdrv_get_aio_context(base_bs) == aio_context);
>> +    aio_context_release(aio_context);
> 
> Er.  OK.  But why?  Isn’t this just guaranteed by the block layer?  I
> don’t think we need an explicit assertion for this, especially if it
> means having to acquire an AioContext.
> 
> Furthermore, I don’t even know why we’d need the AioContext.  On one
> hand, we don’t need to acquire a context just to get it or compare it;
> on the other, this I would have thought that .bdrv_open() runs in the
> BDS’s AioContext anyway (or the caller already has it acquired at least).
> 
> Max
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] copy-on-read: limit guest writes to base in COR driver
  2020-08-28 16:52 ` [PATCH v8 5/7] copy-on-read: limit guest writes to base in " Andrey Shinkevich via
@ 2020-09-04 12:50   ` Max Reitz
  2020-09-04 13:59     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Max Reitz @ 2020-09-04 12:50 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, vsementsov, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 4012 bytes --]

On 28.08.20 18:52, Andrey Shinkevich wrote:
> Limit the guest's COR operations by the base node in the backing chain
> during a stream job.

I don’t understand.   Shouldn’t we limit the areas where we set the COR
flag?

> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/copy-on-read.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index 1f858bb..ecbd1f8 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -57,6 +57,37 @@ static BlockDriverState *get_base_by_name(BlockDriverState *bs,
>      return base_bs;
>  }
>  
> +/*
> + * Returns 1 if the block is allocated in a node between cor_filter_bs->file->bs
> + * and the base_bs in the backing chain. Otherwise, returns 0.
> + * The COR operation is allowed if the base_bs is not specified - return 1.
> + * Returns negative errno on failure.
> + */
> +static int node_above_base(BlockDriverState *cor_filter_bs, uint64_t offset,
> +                           uint64_t bytes)
> +{
> +    int ret;
> +    int64_t dummy;
> +    BlockDriverState *file = NULL;
> +    BDRVStateCOR *state = cor_filter_bs->opaque;
> +
> +    if (!state->base_bs) {
> +        return 1;
> +    }
> +
> +    ret = bdrv_block_status_above(cor_filter_bs->file->bs, state->base_bs,
> +                                  offset, bytes, &dummy, NULL, &file);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    if (file) {

Why check file and not the return value?

> +        return 1;
> +    }
> +
> +    return 0;

“dummy” should really not be called that way, it should be evaluated
whether it’s smaller than bytes.

First, [offset, offset + dummy) may not be allocated above the base –
but [offset + dummy, offset + bytes) may be.  Then this function returns
0 here, even though there is something in that range that’s allocated.

Second, in that case we still shouldn’t return 1, but return the
shrunken offset instead.  Or, if there are multiple distinct allocated
areas, they should probably even all be copied separately.


(But all of that of course only if this function is intended to be used
to limit where we set the COR flag, because I don’t understand why we’d
want to limit where something can be written.)

Max

> +}
> +
>  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>                      Error **errp)
>  {
> @@ -153,6 +184,12 @@ static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
>                                              QEMUIOVector *qiov,
>                                              size_t qiov_offset, int flags)
>  {
> +    int ret = node_above_base(bs, offset, bytes);
> +
> +    if (!ret || ret < 0) {
> +        return ret;
> +    }
> +
>      return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
>                                  flags);
>  }
> @@ -162,6 +199,12 @@ static int coroutine_fn cor_co_pwrite_zeroes(BlockDriverState *bs,
>                                               int64_t offset, int bytes,
>                                               BdrvRequestFlags flags)
>  {
> +    int ret = node_above_base(bs, offset, bytes);
> +
> +    if (!ret || ret < 0) {
> +        return ret;
> +    }
> +
>      return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
>  }
>  
> @@ -178,6 +221,12 @@ static int coroutine_fn cor_co_pwritev_compressed(BlockDriverState *bs,
>                                                    uint64_t bytes,
>                                                    QEMUIOVector *qiov)
>  {
> +    int ret = node_above_base(bs, offset, bytes);
> +
> +    if (!ret || ret < 0) {
> +        return ret;
> +    }
> +
>      return bdrv_co_pwritev(bs->file, offset, bytes, qiov,
>                             BDRV_REQ_WRITE_COMPRESSED);
>  }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 6/7] block-stream: freeze link to base node during stream job
  2020-08-28 16:52 ` [PATCH v8 6/7] block-stream: freeze link to base node during stream job Andrey Shinkevich via
@ 2020-09-04 13:21   ` Max Reitz
  2020-09-04 13:48     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Max Reitz @ 2020-09-04 13:21 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, vsementsov, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 1467 bytes --]

On 28.08.20 18:52, Andrey Shinkevich wrote:
> To keep the base node unchanged during the block-stream operation,
> freeze it as the other part of the backing chain with the intermediate
> nodes related to the job.
> This patch revers the change made with the commit c624b015bf as the
> correct base file name and its format have to be written down to the
> QCOW2 header on the disk when the backing file is being changed after
> the stream job completes.
> This reversion incurs changes in the tests 030, 245 and discards the
> test 258 where concurrent stream/commit jobs are tested. When the link
> to a base node is frozen, the concurrent job cannot change the common
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/stream.c             |  29 ++------
>  tests/qemu-iotests/030     |  10 +--
>  tests/qemu-iotests/245     |   2 +-
>  tests/qemu-iotests/258     | 161 ---------------------------------------------
>  tests/qemu-iotests/258.out |  33 ----------
>  5 files changed, 14 insertions(+), 221 deletions(-)
>  delete mode 100755 tests/qemu-iotests/258
>  delete mode 100644 tests/qemu-iotests/258.out

Does this need to be in this series?  (I’m not entirely sure, based on
what I can see in patch 7.)

When doing this, should we introduce a @bottom-node option alongside, so
that we can make all the tests that are deleted here pass still, just
with changes?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 7/7] block: apply COR-filter to block-stream jobs
  2020-08-28 16:52 ` [PATCH v8 7/7] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
@ 2020-09-04 13:41   ` Max Reitz
  0 siblings, 0 replies; 29+ messages in thread
From: Max Reitz @ 2020-09-04 13:41 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-block
  Cc: kwolf, vsementsov, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 1722 bytes --]

On 28.08.20 18:52, Andrey Shinkevich wrote:
> This patch completes the series with the COR-filter insertion for
> block-stream operations. Adding the filter makes it possible for copied
> regions to be discarded in backing files during the block-stream job,
> what will reduce the disk overuse.
> The COR-filter insertion incurs changes in the iotests case
> 245:test_block_stream_4 that reopens the backing chain during a
> block-stream job. There are changes in the iotests #030 as well.
> The iotests case 030:test_stream_parallel was deleted due to multiple
> conflicts between the concurrent job operations over the same backing
> chain. The base backing node for one job is the top node for another
> job. It may change due to the filter node inserted into the backing
> chain while both jobs are running. Another issue is that the parts of
> the backing chain are being frozen by the running job and may not be
> changed by the concurrent job when needed. The concept of the parallel
> jobs with common nodes is considered vital no more.
> 
> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
> ---
>  block/stream.c             | 78 +++++++++++++++++++++++++++-------------------
>  tests/qemu-iotests/030     | 50 +++--------------------------
>  tests/qemu-iotests/030.out |  4 +--
>  tests/qemu-iotests/245     | 19 ++++++++---
>  4 files changed, 67 insertions(+), 84 deletions(-)

I’m not sure I can really review this, when I don’t know what the COR
filter really does.  For example, as it is in this version, it
unconditionally performs COR, and so on guest reads will copy everything
from the bottom layers even when they are below the base node.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 6/7] block-stream: freeze link to base node during stream job
  2020-09-04 13:21   ` Max Reitz
@ 2020-09-04 13:48     ` Vladimir Sementsov-Ogievskiy
  2020-09-07 11:44       ` Max Reitz
  0 siblings, 1 reply; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-04 13:48 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

04.09.2020 16:21, Max Reitz wrote:
> On 28.08.20 18:52, Andrey Shinkevich wrote:
>> To keep the base node unchanged during the block-stream operation,
>> freeze it as the other part of the backing chain with the intermediate
>> nodes related to the job.
>> This patch revers the change made with the commit c624b015bf as the
>> correct base file name and its format have to be written down to the
>> QCOW2 header on the disk when the backing file is being changed after
>> the stream job completes.
>> This reversion incurs changes in the tests 030, 245 and discards the
>> test 258 where concurrent stream/commit jobs are tested. When the link
>> to a base node is frozen, the concurrent job cannot change the common
>> backing chain.
>>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/stream.c             |  29 ++------
>>   tests/qemu-iotests/030     |  10 +--
>>   tests/qemu-iotests/245     |   2 +-
>>   tests/qemu-iotests/258     | 161 ---------------------------------------------
>>   tests/qemu-iotests/258.out |  33 ----------
>>   5 files changed, 14 insertions(+), 221 deletions(-)
>>   delete mode 100755 tests/qemu-iotests/258
>>   delete mode 100644 tests/qemu-iotests/258.out
> 
> Does this need to be in this series?  (I’m not entirely sure, based on
> what I can see in patch 7.)
> 
> When doing this, should we introduce a @bottom-node option alongside, so
> that we can make all the tests that are deleted here pass still, just
> with changes?
> 

That's a question to discuss, and I asked Andrey to make this patch in this
simple way to see, how much damage we have with this change.

Honestly, I doubt that we need the new option. Previously, we decided that
we can make this change without a deprecation. If we still going to do it,
we shouldn't care about these use cases. So, if we freeze base again wituhout
a depreaction and:

1. add bottom-node

  - we keep the iotests
  - If (unlikely) someone will came and say: hey, you've broken my working scenario, we will say "use bottom-node option, sorry"
  - Most likely we'll have unused option and corresponding unused logic, making code more complex for nothing (and we'll have to say "sorry" anyway)

2. without option

  - we loose the iotests. this looks scary, but it is honest: we drop use-cases and corresponding iotests
  - If (unlikely) someone will came and say: hey, you've broken my working scenario, he will have to wait for next release / package version / or update the downstream himself
  - Most likely all will be OK.


Hmm. OK, and the hard-way:

3. Enable all the new logic (filter insertion, freezing base, etc.) only when filter-node-name option specified. And immediately deprecate not-specifying the option.
  [Note, that in way [3] we don't need bottom-node option]



-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] copy-on-read: limit guest writes to base in COR driver
  2020-09-04 12:50   ` Max Reitz
@ 2020-09-04 13:59     ` Vladimir Sementsov-Ogievskiy
  2020-09-22 13:13       ` Andrey Shinkevich
  0 siblings, 1 reply; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-04 13:59 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

04.09.2020 15:50, Max Reitz wrote:
> On 28.08.20 18:52, Andrey Shinkevich wrote:
>> Limit the guest's COR operations by the base node in the backing chain
>> during a stream job.
> 
> I don’t understand.   Shouldn’t we limit the areas where we set the COR
> flag?
> 
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 49 insertions(+)
>>
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index 1f858bb..ecbd1f8 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -57,6 +57,37 @@ static BlockDriverState *get_base_by_name(BlockDriverState *bs,
>>       return base_bs;
>>   }
>>   
>> +/*
>> + * Returns 1 if the block is allocated in a node between cor_filter_bs->file->bs
>> + * and the base_bs in the backing chain. Otherwise, returns 0.
>> + * The COR operation is allowed if the base_bs is not specified - return 1.
>> + * Returns negative errno on failure.
>> + */
>> +static int node_above_base(BlockDriverState *cor_filter_bs, uint64_t offset,
>> +                           uint64_t bytes)
>> +{
>> +    int ret;
>> +    int64_t dummy;
>> +    BlockDriverState *file = NULL;
>> +    BDRVStateCOR *state = cor_filter_bs->opaque;
>> +
>> +    if (!state->base_bs) {
>> +        return 1;
>> +    }
>> +
>> +    ret = bdrv_block_status_above(cor_filter_bs->file->bs, state->base_bs,
>> +                                  offset, bytes, &dummy, NULL, &file);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    if (file) {
> 
> Why check file and not the return value?
> 
>> +        return 1;
>> +    }
>> +
>> +    return 0;
> 
> “dummy” should really not be called that way, it should be evaluated
> whether it’s smaller than bytes.
> 
> First, [offset, offset + dummy) may not be allocated above the base –
> but [offset + dummy, offset + bytes) may be.  Then this function returns
> 0 here, even though there is something in that range that’s allocated.
> 
> Second, in that case we still shouldn’t return 1, but return the
> shrunken offset instead.  Or, if there are multiple distinct allocated
> areas, they should probably even all be copied separately.
> 
> 
> (But all of that of course only if this function is intended to be used
> to limit where we set the COR flag, because I don’t understand why we’d
> want to limit where something can be written.)
> 

Agree to all.

1. Write path shouldn't be changed: it's a copy-on-read filter.

2. On read we need is_allocated_above-driven loop, to insert the flag only to regions allocated above base
  (and other regions we read just without the flag, don't skip them). qiov_offset will help very well.

3. Like in many other places, let's ignore block-status errors (and just add the flag if block_status fails)

>> +}
>> +
>>   static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>                       Error **errp)
>>   {
>> @@ -153,6 +184,12 @@ static int coroutine_fn cor_co_pwritev_part(BlockDriverState *bs,
>>                                               QEMUIOVector *qiov,
>>                                               size_t qiov_offset, int flags)
>>   {
>> +    int ret = node_above_base(bs, offset, bytes);
>> +
>> +    if (!ret || ret < 0) {
>> +        return ret;
>> +    }
>> +
>>       return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, qiov_offset,
>>                                   flags);
>>   }
>> @@ -162,6 +199,12 @@ static int coroutine_fn cor_co_pwrite_zeroes(BlockDriverState *bs,
>>                                                int64_t offset, int bytes,
>>                                                BdrvRequestFlags flags)
>>   {
>> +    int ret = node_above_base(bs, offset, bytes);
>> +
>> +    if (!ret || ret < 0) {
>> +        return ret;
>> +    }
>> +
>>       return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
>>   }
>>   
>> @@ -178,6 +221,12 @@ static int coroutine_fn cor_co_pwritev_compressed(BlockDriverState *bs,
>>                                                     uint64_t bytes,
>>                                                     QEMUIOVector *qiov)
>>   {
>> +    int ret = node_above_base(bs, offset, bytes);
>> +
>> +    if (!ret || ret < 0) {
>> +        return ret;
>> +    }
>> +
>>       return bdrv_co_pwritev(bs->file, offset, bytes, qiov,
>>                              BDRV_REQ_WRITE_COMPRESSED);
>>   }
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 6/7] block-stream: freeze link to base node during stream job
  2020-09-04 13:48     ` Vladimir Sementsov-Ogievskiy
@ 2020-09-07 11:44       ` Max Reitz
  2020-09-07 12:17         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Max Reitz @ 2020-09-07 11:44 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: kwolf, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 3844 bytes --]

On 04.09.20 15:48, Vladimir Sementsov-Ogievskiy wrote:
> 04.09.2020 16:21, Max Reitz wrote:
>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>> To keep the base node unchanged during the block-stream operation,
>>> freeze it as the other part of the backing chain with the intermediate
>>> nodes related to the job.
>>> This patch revers the change made with the commit c624b015bf as the
>>> correct base file name and its format have to be written down to the
>>> QCOW2 header on the disk when the backing file is being changed after
>>> the stream job completes.
>>> This reversion incurs changes in the tests 030, 245 and discards the
>>> test 258 where concurrent stream/commit jobs are tested. When the link
>>> to a base node is frozen, the concurrent job cannot change the common
>>> backing chain.
>>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/stream.c             |  29 ++------
>>>   tests/qemu-iotests/030     |  10 +--
>>>   tests/qemu-iotests/245     |   2 +-
>>>   tests/qemu-iotests/258     | 161
>>> ---------------------------------------------
>>>   tests/qemu-iotests/258.out |  33 ----------
>>>   5 files changed, 14 insertions(+), 221 deletions(-)
>>>   delete mode 100755 tests/qemu-iotests/258
>>>   delete mode 100644 tests/qemu-iotests/258.out
>>
>> Does this need to be in this series?  (I’m not entirely sure, based on
>> what I can see in patch 7.)
>>
>> When doing this, should we introduce a @bottom-node option alongside, so
>> that we can make all the tests that are deleted here pass still, just
>> with changes?
>>
> 
> That's a question to discuss, and I asked Andrey to make this patch in this
> simple way to see, how much damage we have with this change.
> 
> Honestly, I doubt that we need the new option. Previously, we decided that
> we can make this change without a deprecation. If we still going to do it,
> we shouldn't care about these use cases. So, if we freeze base again
> wituhout
> a depreaction and:
> 
> 1. add bottom-node
> 
>  - we keep the iotests
>  - If (unlikely) someone will came and say: hey, you've broken my
> working scenario, we will say "use bottom-node option, sorry"
>  - Most likely we'll have unused option and corresponding unused logic,
> making code more complex for nothing (and we'll have to say "sorry" anyway)
> 
> 2. without option
> 
>  - we loose the iotests. this looks scary, but it is honest: we drop
> use-cases and corresponding iotests
>  - If (unlikely) someone will came and say: hey, you've broken my
> working scenario, he will have to wait for next release / package
> version / or update the downstream himself
>  - Most likely all will be OK.

Well, yes, we’ll disrupt either way, but it is a difference whether we
can tell people immediately that there’s an alternative now, or whether
we’ll have to make them wait for the next release.

Basically, the whole argument hinges on the question of whether anyone
uses this right now or not, and we just don’t know.

The question remains whether this patch is necessary for this series.
We also have the option of introducing @bottom-node, leaving @base’s
behavior as-is and explaining it as a legacy option from which
@bottom-node is inferred.  Then specifying @base just becomes weird and
problem-prone when the graph is reconfigured while the job is active,
but you can get around that by simply using the non-legacy option.

Max

> Hmm. OK, and the hard-way:
> 
> 3. Enable all the new logic (filter insertion, freezing base, etc.) only
> when filter-node-name option specified. And immediately deprecate
> not-specifying the option.
>  [Note, that in way [3] we don't need bottom-node option]
> 
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 6/7] block-stream: freeze link to base node during stream job
  2020-09-07 11:44       ` Max Reitz
@ 2020-09-07 12:17         ` Vladimir Sementsov-Ogievskiy
  2020-09-24 12:46           ` Max Reitz
  0 siblings, 1 reply; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-07 12:17 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

07.09.2020 14:44, Max Reitz wrote:
> On 04.09.20 15:48, Vladimir Sementsov-Ogievskiy wrote:
>> 04.09.2020 16:21, Max Reitz wrote:
>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>> To keep the base node unchanged during the block-stream operation,
>>>> freeze it as the other part of the backing chain with the intermediate
>>>> nodes related to the job.
>>>> This patch revers the change made with the commit c624b015bf as the
>>>> correct base file name and its format have to be written down to the
>>>> QCOW2 header on the disk when the backing file is being changed after
>>>> the stream job completes.
>>>> This reversion incurs changes in the tests 030, 245 and discards the
>>>> test 258 where concurrent stream/commit jobs are tested. When the link
>>>> to a base node is frozen, the concurrent job cannot change the common
>>>> backing chain.
>>>>
>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>> ---
>>>>    block/stream.c             |  29 ++------
>>>>    tests/qemu-iotests/030     |  10 +--
>>>>    tests/qemu-iotests/245     |   2 +-
>>>>    tests/qemu-iotests/258     | 161
>>>> ---------------------------------------------
>>>>    tests/qemu-iotests/258.out |  33 ----------
>>>>    5 files changed, 14 insertions(+), 221 deletions(-)
>>>>    delete mode 100755 tests/qemu-iotests/258
>>>>    delete mode 100644 tests/qemu-iotests/258.out
>>>
>>> Does this need to be in this series?  (I’m not entirely sure, based on
>>> what I can see in patch 7.)
>>>
>>> When doing this, should we introduce a @bottom-node option alongside, so
>>> that we can make all the tests that are deleted here pass still, just
>>> with changes?
>>>
>>
>> That's a question to discuss, and I asked Andrey to make this patch in this
>> simple way to see, how much damage we have with this change.
>>
>> Honestly, I doubt that we need the new option. Previously, we decided that
>> we can make this change without a deprecation. If we still going to do it,
>> we shouldn't care about these use cases. So, if we freeze base again
>> wituhout
>> a depreaction and:
>>
>> 1. add bottom-node
>>
>>   - we keep the iotests
>>   - If (unlikely) someone will came and say: hey, you've broken my
>> working scenario, we will say "use bottom-node option, sorry"
>>   - Most likely we'll have unused option and corresponding unused logic,
>> making code more complex for nothing (and we'll have to say "sorry" anyway)
>>
>> 2. without option
>>
>>   - we loose the iotests. this looks scary, but it is honest: we drop
>> use-cases and corresponding iotests
>>   - If (unlikely) someone will came and say: hey, you've broken my
>> working scenario, he will have to wait for next release / package
>> version / or update the downstream himself
>>   - Most likely all will be OK.
> 
> Well, yes, we’ll disrupt either way, but it is a difference whether we
> can tell people immediately that there’s an alternative now, or whether
> we’ll have to make them wait for the next release.
> 
> Basically, the whole argument hinges on the question of whether anyone
> uses this right now or not, and we just don’t know.
> 
> The question remains whether this patch is necessary for this series.

Otherwise iotests fail :)

> We also have the option of introducing @bottom-node, leaving @base’s
> behavior as-is

You mean not make it freeze base again, but just don't care?

> and explaining it as a legacy option from which
> @bottom-node is inferred.  Then specifying @base just becomes weird and
> problem-prone when the graph is reconfigured while the job is active,
> but you can get around that by simply using the non-legacy option.

Hmm. Last time, I thought that bottom-node was a bad idea, as we have a lot of problems with it, but you think it should be kept as preferred behavior? But this sounds as working idea.

Then, we'll probably want to set skip_filters(bottom->backing) as backing of top in qcow2 metadata, and direct bottom->backing as new backing of top in block node graph.

Anyway, I like the idea to deprecate filename-based interfaces wherever we can.

PS: Sorry for my decreased attention to the list for last weeks, I have to finish necessary work for Virtuozzo release.

> 
> Max
> 
>> Hmm. OK, and the hard-way:
>>
>> 3. Enable all the new logic (filter insertion, freezing base, etc.) only
>> when filter-node-name option specified. And immediately deprecate
>> not-specifying the option.
>>   [Note, that in way [3] we don't need bottom-node option]
>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-04 11:22   ` Max Reitz
@ 2020-09-17 16:09     ` Andrey Shinkevich
  2020-09-23 14:38       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich @ 2020-09-17 16:09 UTC (permalink / raw)
  To: Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den, vsementsov

[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]

On 04.09.2020 14:22, Max Reitz wrote:
> On 28.08.20 18:52, Andrey Shinkevich wrote:
>> Provide API for the COR-filter insertion/removal.
...
>
>> Also, drop the filter child permissions for an inactive state when the
>> filter node is being removed.
> Do we need .active for that?  Shouldn’t it be sufficient to just not
> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
> node (i.e. perm == 0 in cor_child_perm())?
>
> Of course, using an .active field works, too.  But Vladimir wanted a
> similar field in the preallocate filter, and there already is in
> backup-top.  I feel like there shouldn’t be a need for this.
>
> .bdrv_child_perm() should generally be able to decide what permissions
> it needs based on the information it gets.  If every driver needs to
> keep track of whether it has any parents and feed that information into
> .bdrv_child_perm() as external state, then something feels wrong.
>
> If perm == 0 is not sufficient to rule out any parents, we should just
> explicitly add a boolean that tells .bdrv_child_perm() whether there are
> any parents or not – shouldn’t be too difficult.


The issue is that we fail in the bdrv_check_update_perm() with 
""Conflicts with use by..." if the *nperm = 0 and the *nshared = 
BLK_PERM_ALL are not set before the call to the bdrv_replace_node(). The 
filter is still in the backing chain by that moment and has a parent 
with child->perm != 0.

The implementation of  the .bdrv_child_perm()in the given patch is 
similar to one in the block/mirror.c.

How could we resolve the issue at the generic layer?


Andrey


>
>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/copy-on-read.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   block/copy-on-read.h |  35 +++++++++++++++++
>>   2 files changed, 139 insertions(+)
>>   create mode 100644 block/copy-on-read.h
> [...]

[-- Attachment #2: Type: text/html, Size: 6996 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] copy-on-read: limit guest writes to base in COR driver
  2020-09-04 13:59     ` Vladimir Sementsov-Ogievskiy
@ 2020-09-22 13:13       ` Andrey Shinkevich
  2020-09-24 11:18         ` Max Reitz
  0 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich @ 2020-09-22 13:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

On 04.09.2020 16:59, Vladimir Sementsov-Ogievskiy wrote:
> 04.09.2020 15:50, Max Reitz wrote:
>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>> Limit the guest's COR operations by the base node in the backing chain
>>> during a stream job.
>>
>> I don’t understand.   Shouldn’t we limit the areas where we set the COR
>> flag?
>>
>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>> ---
>>>   block/copy-on-read.c | 49 
>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 49 insertions(+)
>>>
>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>> index 1f858bb..ecbd1f8 100644
>>> --- a/block/copy-on-read.c
>>> +++ b/block/copy-on-read.c
>>> @@ -57,6 +57,37 @@ static BlockDriverState 
>>> *get_base_by_name(BlockDriverState *bs,
>>>       return base_bs;
>>>   }
>>>   +/*
>>> + * Returns 1 if the block is allocated in a node between 
>>> cor_filter_bs->file->bs
>>> + * and the base_bs in the backing chain. Otherwise, returns 0.
>>> + * The COR operation is allowed if the base_bs is not specified - 
>>> return 1.
>>> + * Returns negative errno on failure.
>>> + */
>>> +static int node_above_base(BlockDriverState *cor_filter_bs, 
>>> uint64_t offset,
>>> +                           uint64_t bytes)
>>> +{
>>> +    int ret;
>>> +    int64_t dummy;
>>> +    BlockDriverState *file = NULL;
>>> +    BDRVStateCOR *state = cor_filter_bs->opaque;
>>> +
>>> +    if (!state->base_bs) {
>>> +        return 1;
>>> +    }
>>> +
>>> +    ret = bdrv_block_status_above(cor_filter_bs->file->bs, 
>>> state->base_bs,
>>> +                                  offset, bytes, &dummy, NULL, &file);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    if (file) {
>>
>> Why check file and not the return value?
>>
>>> +        return 1;
>>> +    }
>>> +
>>> +    return 0;
>>
>> “dummy” should really not be called that way, it should be evaluated
>> whether it’s smaller than bytes.
>>
>> First, [offset, offset + dummy) may not be allocated above the base –
>> but [offset + dummy, offset + bytes) may be.  Then this function returns
>> 0 here, even though there is something in that range that’s allocated.
>>
>> Second, in that case we still shouldn’t return 1, but return the
>> shrunken offset instead.  Or, if there are multiple distinct allocated
>> areas, they should probably even all be copied separately.
>>
>>
>> (But all of that of course only if this function is intended to be used
>> to limit where we set the COR flag, because I don’t understand why we’d
>> want to limit where something can be written.)
>>
>
> Agree to all.
>
> 1. Write path shouldn't be changed: it's a copy-on-read filter.
>
> 2. On read we need is_allocated_above-driven loop, to insert the flag 
> only to regions allocated above base
>  (and other regions we read just without the flag, don't skip them). 
> qiov_offset will help very well.
>
> 3. Like in many other places, let's ignore  errors (and just add the 
> flag if block_status fails)


If "block_status" fails, the stream job does not copy. Shall we keep the 
same behavior in the cor_co_preadv_part()?


Andrey

>
>>> +}
>>> +
>>>   static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>>                       Error **errp)
>>>   {
>>> @@ -153,6 +184,12 @@ static int coroutine_fn 
>>> cor_co_pwritev_part(BlockDriverState *bs,
>>>                                               QEMUIOVector *qiov,
>>>                                               size_t qiov_offset, 
>>> int flags)
>>>   {
>>> +    int ret = node_above_base(bs, offset, bytes);
>>> +
>>> +    if (!ret || ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>>       return bdrv_co_pwritev_part(bs->file, offset, bytes, qiov, 
>>> qiov_offset,
>>>                                   flags);
>>>   }
>>> @@ -162,6 +199,12 @@ static int coroutine_fn 
>>> cor_co_pwrite_zeroes(BlockDriverState *bs,
>>>                                                int64_t offset, int 
>>> bytes,
>>> BdrvRequestFlags flags)
>>>   {
>>> +    int ret = node_above_base(bs, offset, bytes);
>>> +
>>> +    if (!ret || ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>>       return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
>>>   }
>>>   @@ -178,6 +221,12 @@ static int coroutine_fn 
>>> cor_co_pwritev_compressed(BlockDriverState *bs,
>>>                                                     uint64_t bytes,
>>> QEMUIOVector *qiov)
>>>   {
>>> +    int ret = node_above_base(bs, offset, bytes);
>>> +
>>> +    if (!ret || ret < 0) {
>>> +        return ret;
>>> +    }
>>> +
>>>       return bdrv_co_pwritev(bs->file, offset, bytes, qiov,
>>>                              BDRV_REQ_WRITE_COMPRESSED);
>>>   }
>>>
>>
>>
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-17 16:09     ` Andrey Shinkevich
@ 2020-09-23 14:38       ` Vladimir Sementsov-Ogievskiy
  2020-09-24 13:25         ` Max Reitz
  0 siblings, 1 reply; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-23 14:38 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

17.09.2020 19:09, Andrey Shinkevich wrote:
> On 04.09.2020 14:22, Max Reitz wrote:
>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>> Provide API for the COR-filter insertion/removal.
> ...
>>> Also, drop the filter child permissions for an inactive state when the
>>> filter node is being removed.
>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>> node (i.e. perm == 0 in cor_child_perm())?
>>
>> Of course, using an .active field works, too.  But Vladimir wanted a
>> similar field in the preallocate filter, and there already is in
>> backup-top.  I feel like there shouldn’t be a need for this.
>>
>> .bdrv_child_perm() should generally be able to decide what permissions
>> it needs based on the information it gets.  If every driver needs to
>> keep track of whether it has any parents and feed that information into
>> .bdrv_child_perm() as external state, then something feels wrong.
>>
>> If perm == 0 is not sufficient to rule out any parents, we should just
>> explicitly add a boolean that tells .bdrv_child_perm() whether there are
>> any parents or not – shouldn’t be too difficult.
> 
> 
> The issue is that we fail in the bdrv_check_update_perm() with ""Conflicts with use by..." if the *nperm = 0 and the *nshared = BLK_PERM_ALL are not set before the call to the bdrv_replace_node(). The filter is still in the backing chain by that moment and has a parent with child->perm != 0.
> 
> The implementation of  the .bdrv_child_perm()in the given patch is similar to one in the block/mirror.c.
> 
> How could we resolve the issue at the generic layer?
> 
> 

The problem is that when we update permissions in bdrv_replace_node, we consider new placement for "to" node, but old placement for "from" node. So, during update, we may consider stricter permission requirements for "from" than needed and they conflict with new "to" permissions.

We should consider all graph changes for permission update simultaneously, in same transaction. For this, we need refactor permission update system to work on queue of nodes instead of simple DFS recursion. And in the queue all the nodes to update should  be toplogically sorted. In this way, when we update node N, all its parents are already updated. And of course, we should make no-perm graph update before permission update, and rollback no-perm movement if permission update failed.

And we need topological sort anyway. Consider the following example:

A -
|  \
|  v
|  B
|  |
v  /
C<-

A is parent for B and C, B is parent for C.

Obviously, to update permissions, we should go in order A B C, so, when we update C, all it's parents permission already updated. But with current approach (simple recursion) we can update in sequence A C B C (C is updated twice). On first update of C, we consider old B permissions, so doing wrong thing. If it succeed, all is OK, on second C update we will finish with correct graph. But if the wrong thing failed, we break the whole process for no reason (it's possible that updated B permission will be less strict, but we will never check it).

I'll work on a patch for it.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 5/7] copy-on-read: limit guest writes to base in COR driver
  2020-09-22 13:13       ` Andrey Shinkevich
@ 2020-09-24 11:18         ` Max Reitz
  0 siblings, 0 replies; 29+ messages in thread
From: Max Reitz @ 2020-09-24 11:18 UTC (permalink / raw)
  To: Andrey Shinkevich, Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: kwolf, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 3706 bytes --]

On 22.09.20 15:13, Andrey Shinkevich wrote:
> On 04.09.2020 16:59, Vladimir Sementsov-Ogievskiy wrote:
>> 04.09.2020 15:50, Max Reitz wrote:
>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>> Limit the guest's COR operations by the base node in the backing chain
>>>> during a stream job.
>>>
>>> I don’t understand.   Shouldn’t we limit the areas where we set the COR
>>> flag?
>>>
>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>> ---
>>>>   block/copy-on-read.c | 49
>>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 49 insertions(+)
>>>>
>>>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>>>> index 1f858bb..ecbd1f8 100644
>>>> --- a/block/copy-on-read.c
>>>> +++ b/block/copy-on-read.c
>>>> @@ -57,6 +57,37 @@ static BlockDriverState
>>>> *get_base_by_name(BlockDriverState *bs,
>>>>       return base_bs;
>>>>   }
>>>>   +/*
>>>> + * Returns 1 if the block is allocated in a node between
>>>> cor_filter_bs->file->bs
>>>> + * and the base_bs in the backing chain. Otherwise, returns 0.
>>>> + * The COR operation is allowed if the base_bs is not specified -
>>>> return 1.
>>>> + * Returns negative errno on failure.
>>>> + */
>>>> +static int node_above_base(BlockDriverState *cor_filter_bs,
>>>> uint64_t offset,
>>>> +                           uint64_t bytes)
>>>> +{
>>>> +    int ret;
>>>> +    int64_t dummy;
>>>> +    BlockDriverState *file = NULL;
>>>> +    BDRVStateCOR *state = cor_filter_bs->opaque;
>>>> +
>>>> +    if (!state->base_bs) {
>>>> +        return 1;
>>>> +    }
>>>> +
>>>> +    ret = bdrv_block_status_above(cor_filter_bs->file->bs,
>>>> state->base_bs,
>>>> +                                  offset, bytes, &dummy, NULL, &file);
>>>> +    if (ret < 0) {
>>>> +        return ret;
>>>> +    }
>>>> +
>>>> +    if (file) {
>>>
>>> Why check file and not the return value?
>>>
>>>> +        return 1;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>
>>> “dummy” should really not be called that way, it should be evaluated
>>> whether it’s smaller than bytes.
>>>
>>> First, [offset, offset + dummy) may not be allocated above the base –
>>> but [offset + dummy, offset + bytes) may be.  Then this function returns
>>> 0 here, even though there is something in that range that’s allocated.
>>>
>>> Second, in that case we still shouldn’t return 1, but return the
>>> shrunken offset instead.  Or, if there are multiple distinct allocated
>>> areas, they should probably even all be copied separately.
>>>
>>>
>>> (But all of that of course only if this function is intended to be used
>>> to limit where we set the COR flag, because I don’t understand why we’d
>>> want to limit where something can be written.)
>>>
>>
>> Agree to all.
>>
>> 1. Write path shouldn't be changed: it's a copy-on-read filter.
>>
>> 2. On read we need is_allocated_above-driven loop, to insert the flag
>> only to regions allocated above base
>>  (and other regions we read just without the flag, don't skip them).
>> qiov_offset will help very well.
>>
>> 3. Like in many other places, let's ignore  errors (and just add the
>> flag if block_status fails)
> 
> 
> If "block_status" fails, the stream job does not copy. Shall we keep the
> same behavior in the cor_co_preadv_part()?

I think copying can’t really hurt, so I think it would be better to copy
if we aren’t sure (because block_status failed).  The difference to
streaming could well be considered a bug fix.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 6/7] block-stream: freeze link to base node during stream job
  2020-09-07 12:17         ` Vladimir Sementsov-Ogievskiy
@ 2020-09-24 12:46           ` Max Reitz
  0 siblings, 0 replies; 29+ messages in thread
From: Max Reitz @ 2020-09-24 12:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: kwolf, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 5347 bytes --]

On 07.09.20 14:17, Vladimir Sementsov-Ogievskiy wrote:
> 07.09.2020 14:44, Max Reitz wrote:
>> On 04.09.20 15:48, Vladimir Sementsov-Ogievskiy wrote:
>>> 04.09.2020 16:21, Max Reitz wrote:
>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>> To keep the base node unchanged during the block-stream operation,
>>>>> freeze it as the other part of the backing chain with the intermediate
>>>>> nodes related to the job.
>>>>> This patch revers the change made with the commit c624b015bf as the
>>>>> correct base file name and its format have to be written down to the
>>>>> QCOW2 header on the disk when the backing file is being changed after
>>>>> the stream job completes.
>>>>> This reversion incurs changes in the tests 030, 245 and discards the
>>>>> test 258 where concurrent stream/commit jobs are tested. When the link
>>>>> to a base node is frozen, the concurrent job cannot change the common
>>>>> backing chain.
>>>>>
>>>>> Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
>>>>> ---
>>>>>    block/stream.c             |  29 ++------
>>>>>    tests/qemu-iotests/030     |  10 +--
>>>>>    tests/qemu-iotests/245     |   2 +-
>>>>>    tests/qemu-iotests/258     | 161
>>>>> ---------------------------------------------
>>>>>    tests/qemu-iotests/258.out |  33 ----------
>>>>>    5 files changed, 14 insertions(+), 221 deletions(-)
>>>>>    delete mode 100755 tests/qemu-iotests/258
>>>>>    delete mode 100644 tests/qemu-iotests/258.out
>>>>
>>>> Does this need to be in this series?  (I’m not entirely sure, based on
>>>> what I can see in patch 7.)
>>>>
>>>> When doing this, should we introduce a @bottom-node option
>>>> alongside, so
>>>> that we can make all the tests that are deleted here pass still, just
>>>> with changes?
>>>>
>>>
>>> That's a question to discuss, and I asked Andrey to make this patch
>>> in this
>>> simple way to see, how much damage we have with this change.
>>>
>>> Honestly, I doubt that we need the new option. Previously, we decided
>>> that
>>> we can make this change without a deprecation. If we still going to
>>> do it,
>>> we shouldn't care about these use cases. So, if we freeze base again
>>> wituhout
>>> a depreaction and:
>>>
>>> 1. add bottom-node
>>>
>>>   - we keep the iotests
>>>   - If (unlikely) someone will came and say: hey, you've broken my
>>> working scenario, we will say "use bottom-node option, sorry"
>>>   - Most likely we'll have unused option and corresponding unused logic,
>>> making code more complex for nothing (and we'll have to say "sorry"
>>> anyway)
>>>
>>> 2. without option
>>>
>>>   - we loose the iotests. this looks scary, but it is honest: we drop
>>> use-cases and corresponding iotests
>>>   - If (unlikely) someone will came and say: hey, you've broken my
>>> working scenario, he will have to wait for next release / package
>>> version / or update the downstream himself
>>>   - Most likely all will be OK.
>>
>> Well, yes, we’ll disrupt either way, but it is a difference whether we
>> can tell people immediately that there’s an alternative now, or whether
>> we’ll have to make them wait for the next release.
>>
>> Basically, the whole argument hinges on the question of whether anyone
>> uses this right now or not, and we just don’t know.
>>
>> The question remains whether this patch is necessary for this series.
> 
> Otherwise iotests fail :)
> 
>> We also have the option of introducing @bottom-node, leaving @base’s
>> behavior as-is
> 
> You mean not make it freeze base again, but just don't care?

Yes.  I think the only problem with that would be that it’s unintuitive
in case the graph is modified while the job is running, but I can’t find
that worse than forbidding that case completely.

(And I think it would be easier to explain if we introduced @bottom-node.)

>> and explaining it as a legacy option from which
>> @bottom-node is inferred.  Then specifying @base just becomes weird and
>> problem-prone when the graph is reconfigured while the job is active,
>> but you can get around that by simply using the non-legacy option.
> 
> Hmm. Last time, I thought that bottom-node was a bad idea, as we have a
> lot of problems with it,

Hm, did we?  Off the top of my head, I can’t remember any.  Besides the
fact that it would require users to use a different parameter and us to
support two parameters unless we decide to deprecate @base.

> but you think it should be kept as preferred
> behavior? But this sounds as working idea.
> 
> Then, we'll probably want to set skip_filters(bottom->backing) as
> backing of top in qcow2 metadata, and direct bottom->backing as new
> backing of top in block node graph.

I’m not sure whether I agree with skipping filters for the qcow2
metadata, just because then it’s different from the runtime state.  But
OTOH I would expect that any application that seriously cares about
filters would override the qcow2 metadata anyway, so I think I do agree
after all.

Yeah, I think skipping filters for the backing file name in the qcow2
header is right.

> Anyway, I like the idea to deprecate filename-based interfaces wherever
> we can.

Who doesn’t...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-23 14:38       ` Vladimir Sementsov-Ogievskiy
@ 2020-09-24 13:25         ` Max Reitz
  2020-09-24 14:51           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Max Reitz @ 2020-09-24 13:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: kwolf, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 3603 bytes --]

On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
> 17.09.2020 19:09, Andrey Shinkevich wrote:
>> On 04.09.2020 14:22, Max Reitz wrote:
>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>> Provide API for the COR-filter insertion/removal.
>> ...
>>>> Also, drop the filter child permissions for an inactive state when the
>>>> filter node is being removed.
>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>> node (i.e. perm == 0 in cor_child_perm())?
>>>
>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>> similar field in the preallocate filter, and there already is in
>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>
>>> .bdrv_child_perm() should generally be able to decide what permissions
>>> it needs based on the information it gets.  If every driver needs to
>>> keep track of whether it has any parents and feed that information into
>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>
>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>> explicitly add a boolean that tells .bdrv_child_perm() whether there are
>>> any parents or not – shouldn’t be too difficult.
>>
>>
>> The issue is that we fail in the bdrv_check_update_perm() with
>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>> The filter is still in the backing chain by that moment and has a
>> parent with child->perm != 0.
>>
>> The implementation of  the .bdrv_child_perm()in the given patch is
>> similar to one in the block/mirror.c.
>>
>> How could we resolve the issue at the generic layer?
>>
>>
> 
> The problem is that when we update permissions in bdrv_replace_node, we
> consider new placement for "to" node, but old placement for "from" node.
> So, during update, we may consider stricter permission requirements for
> "from" than needed and they conflict with new "to" permissions.
> 
> We should consider all graph changes for permission update
> simultaneously, in same transaction. For this, we need refactor
> permission update system to work on queue of nodes instead of simple DFS
> recursion. And in the queue all the nodes to update should  be
> toplogically sorted. In this way, when we update node N, all its parents
> are already updated. And of course, we should make no-perm graph update
> before permission update, and rollback no-perm movement if permission
> update failed.

OK, you’ve convinced me, .active is good enough for now. O:)

> And we need topological sort anyway. Consider the following example:
> 
> A -
> |  \
> |  v
> |  B
> |  |
> v  /
> C<-
> 
> A is parent for B and C, B is parent for C.
> 
> Obviously, to update permissions, we should go in order A B C, so, when
> we update C, all it's parents permission already updated. But with
> current approach (simple recursion) we can update in sequence A C B C (C
> is updated twice). On first update of C, we consider old B permissions,
> so doing wrong thing. If it succeed, all is OK, on second C update we
> will finish with correct graph. But if the wrong thing failed, we break
> the whole process for no reason (it's possible that updated B permission
> will be less strict, but we will never check it).

True.

> I'll work on a patch for it.

Sounds great, though I fear for the complexity.  I’ll just wait and see
for now.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-24 13:25         ` Max Reitz
@ 2020-09-24 14:51           ` Vladimir Sementsov-Ogievskiy
  2020-09-24 15:00             ` Max Reitz
  0 siblings, 1 reply; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-24 14:51 UTC (permalink / raw)
  To: Max Reitz, Andrey Shinkevich, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

24.09.2020 16:25, Max Reitz wrote:
> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>> Provide API for the COR-filter insertion/removal.
>>> ...
>>>>> Also, drop the filter child permissions for an inactive state when the
>>>>> filter node is being removed.
>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>
>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>> similar field in the preallocate filter, and there already is in
>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>
>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>> it needs based on the information it gets.  If every driver needs to
>>>> keep track of whether it has any parents and feed that information into
>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>
>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>> explicitly add a boolean that tells .bdrv_child_perm() whether there are
>>>> any parents or not – shouldn’t be too difficult.
>>>
>>>
>>> The issue is that we fail in the bdrv_check_update_perm() with
>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>> The filter is still in the backing chain by that moment and has a
>>> parent with child->perm != 0.
>>>
>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>> similar to one in the block/mirror.c.
>>>
>>> How could we resolve the issue at the generic layer?
>>>
>>>
>>
>> The problem is that when we update permissions in bdrv_replace_node, we
>> consider new placement for "to" node, but old placement for "from" node.
>> So, during update, we may consider stricter permission requirements for
>> "from" than needed and they conflict with new "to" permissions.
>>
>> We should consider all graph changes for permission update
>> simultaneously, in same transaction. For this, we need refactor
>> permission update system to work on queue of nodes instead of simple DFS
>> recursion. And in the queue all the nodes to update should  be
>> toplogically sorted. In this way, when we update node N, all its parents
>> are already updated. And of course, we should make no-perm graph update
>> before permission update, and rollback no-perm movement if permission
>> update failed.
> 
> OK, you’ve convinced me, .active is good enough for now. O:)
> 
>> And we need topological sort anyway. Consider the following example:
>>
>> A -
>> |  \
>> |  v
>> |  B
>> |  |
>> v  /
>> C<-
>>
>> A is parent for B and C, B is parent for C.
>>
>> Obviously, to update permissions, we should go in order A B C, so, when
>> we update C, all it's parents permission already updated. But with
>> current approach (simple recursion) we can update in sequence A C B C (C
>> is updated twice). On first update of C, we consider old B permissions,
>> so doing wrong thing. If it succeed, all is OK, on second C update we
>> will finish with correct graph. But if the wrong thing failed, we break
>> the whole process for no reason (it's possible that updated B permission
>> will be less strict, but we will never check it).
> 
> True.
> 
>> I'll work on a patch for it.
> 
> Sounds great, though I fear for the complexity.  I’ll just wait and see
> for now.
> 

If you are OK with .active for now, then I think, Andrey can resend with
.active and I'll dig into permissions in parallel. If Andrey's series
go first, I'll just drop .active later if my idea works.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-24 14:51           ` Vladimir Sementsov-Ogievskiy
@ 2020-09-24 15:00             ` Max Reitz
  2020-09-24 17:29               ` Andrey Shinkevich
  0 siblings, 1 reply; 29+ messages in thread
From: Max Reitz @ 2020-09-24 15:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Andrey Shinkevich, qemu-block
  Cc: kwolf, jsnow, qemu-devel, armbru, den


[-- Attachment #1.1: Type: text/plain, Size: 4129 bytes --]

On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
> 24.09.2020 16:25, Max Reitz wrote:
>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>> Provide API for the COR-filter insertion/removal.
>>>> ...
>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>> the
>>>>>> filter node is being removed.
>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>
>>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>>> similar field in the preallocate filter, and there already is in
>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>
>>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>>> it needs based on the information it gets.  If every driver needs to
>>>>> keep track of whether it has any parents and feed that information
>>>>> into
>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>
>>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>> there are
>>>>> any parents or not – shouldn’t be too difficult.
>>>>
>>>>
>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>> The filter is still in the backing chain by that moment and has a
>>>> parent with child->perm != 0.
>>>>
>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>> similar to one in the block/mirror.c.
>>>>
>>>> How could we resolve the issue at the generic layer?
>>>>
>>>>
>>>
>>> The problem is that when we update permissions in bdrv_replace_node, we
>>> consider new placement for "to" node, but old placement for "from" node.
>>> So, during update, we may consider stricter permission requirements for
>>> "from" than needed and they conflict with new "to" permissions.
>>>
>>> We should consider all graph changes for permission update
>>> simultaneously, in same transaction. For this, we need refactor
>>> permission update system to work on queue of nodes instead of simple DFS
>>> recursion. And in the queue all the nodes to update should  be
>>> toplogically sorted. In this way, when we update node N, all its parents
>>> are already updated. And of course, we should make no-perm graph update
>>> before permission update, and rollback no-perm movement if permission
>>> update failed.
>>
>> OK, you’ve convinced me, .active is good enough for now. O:)
>>
>>> And we need topological sort anyway. Consider the following example:
>>>
>>> A -
>>> |  \
>>> |  v
>>> |  B
>>> |  |
>>> v  /
>>> C<-
>>>
>>> A is parent for B and C, B is parent for C.
>>>
>>> Obviously, to update permissions, we should go in order A B C, so, when
>>> we update C, all it's parents permission already updated. But with
>>> current approach (simple recursion) we can update in sequence A C B C (C
>>> is updated twice). On first update of C, we consider old B permissions,
>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>> will finish with correct graph. But if the wrong thing failed, we break
>>> the whole process for no reason (it's possible that updated B permission
>>> will be less strict, but we will never check it).
>>
>> True.
>>
>>> I'll work on a patch for it.
>>
>> Sounds great, though I fear for the complexity.  I’ll just wait and see
>> for now.
>>
> 
> If you are OK with .active for now, then I think, Andrey can resend with
> .active and I'll dig into permissions in parallel. If Andrey's series
> go first, I'll just drop .active later if my idea works.

Sure, that works for me.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-24 15:00             ` Max Reitz
@ 2020-09-24 17:29               ` Andrey Shinkevich
  2020-09-24 17:40                 ` Andrey Shinkevich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich @ 2020-09-24 17:29 UTC (permalink / raw)
  To: Max Reitz, Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

On 24.09.2020 18:00, Max Reitz wrote:
> On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
>> 24.09.2020 16:25, Max Reitz wrote:
>>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>>> Provide API for the COR-filter insertion/removal.
>>>>> ...
>>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>>> the
>>>>>>> filter node is being removed.
>>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>>
>>>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>>>> similar field in the preallocate filter, and there already is in
>>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>>
>>>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>>>> it needs based on the information it gets.  If every driver needs to
>>>>>> keep track of whether it has any parents and feed that information
>>>>>> into
>>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>>
>>>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>>> there are
>>>>>> any parents or not – shouldn’t be too difficult.
>>>>>
>>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>>> The filter is still in the backing chain by that moment and has a
>>>>> parent with child->perm != 0.
>>>>>
>>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>>> similar to one in the block/mirror.c.
>>>>>
>>>>> How could we resolve the issue at the generic layer?
>>>>>
>>>>>
>>>> The problem is that when we update permissions in bdrv_replace_node, we
>>>> consider new placement for "to" node, but old placement for "from" node.
>>>> So, during update, we may consider stricter permission requirements for
>>>> "from" than needed and they conflict with new "to" permissions.
>>>>
>>>> We should consider all graph changes for permission update
>>>> simultaneously, in same transaction. For this, we need refactor
>>>> permission update system to work on queue of nodes instead of simple DFS
>>>> recursion. And in the queue all the nodes to update should  be
>>>> toplogically sorted. In this way, when we update node N, all its parents
>>>> are already updated. And of course, we should make no-perm graph update
>>>> before permission update, and rollback no-perm movement if permission
>>>> update failed.
>>> OK, you’ve convinced me, .active is good enough for now. O:)
>>>
>>>> And we need topological sort anyway. Consider the following example:
>>>>
>>>> A -
>>>> |  \
>>>> |  v
>>>> |  B
>>>> |  |
>>>> v  /
>>>> C<-
>>>>
>>>> A is parent for B and C, B is parent for C.
>>>>
>>>> Obviously, to update permissions, we should go in order A B C, so, when
>>>> we update C, all it's parents permission already updated. But with
>>>> current approach (simple recursion) we can update in sequence A C B C (C
>>>> is updated twice). On first update of C, we consider old B permissions,
>>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>>> will finish with correct graph. But if the wrong thing failed, we break
>>>> the whole process for no reason (it's possible that updated B permission
>>>> will be less strict, but we will never check it).
>>> True.
>>>
>>>> I'll work on a patch for it.
>>> Sounds great, though I fear for the complexity.  I’ll just wait and see
>>> for now.
>>>
>> If you are OK with .active for now, then I think, Andrey can resend with
>> .active and I'll dig into permissions in parallel. If Andrey's series
>> go first, I'll just drop .active later if my idea works.
> Sure, that works for me.
>
> Max


So, I am keeping the filter insert/remove functions in the COR-driver 
code for now rather than moving them to the block generic layer, aren't I?

Andrey



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-24 17:29               ` Andrey Shinkevich
@ 2020-09-24 17:40                 ` Andrey Shinkevich
  2020-09-24 17:49                   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 29+ messages in thread
From: Andrey Shinkevich @ 2020-09-24 17:40 UTC (permalink / raw)
  To: Max Reitz, Vladimir Sementsov-Ogievskiy, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

On 24.09.2020 20:29, Andrey Shinkevich wrote:
> On 24.09.2020 18:00, Max Reitz wrote:
>> On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
>>> 24.09.2020 16:25, Max Reitz wrote:
>>>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>>>> Provide API for the COR-filter insertion/removal.
>>>>>> ...
>>>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>>>> the
>>>>>>>> filter node is being removed.
>>>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just 
>>>>>>> not
>>>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken 
>>>>>>> on the
>>>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>>>
>>>>>>> Of course, using an .active field works, too.  But Vladimir 
>>>>>>> wanted a
>>>>>>> similar field in the preallocate filter, and there already is in
>>>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>>>
>>>>>>> .bdrv_child_perm() should generally be able to decide what 
>>>>>>> permissions
>>>>>>> it needs based on the information it gets.  If every driver 
>>>>>>> needs to
>>>>>>> keep track of whether it has any parents and feed that information
>>>>>>> into
>>>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>>>
>>>>>>> If perm == 0 is not sufficient to rule out any parents, we 
>>>>>>> should just
>>>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>>>> there are
>>>>>>> any parents or not – shouldn’t be too difficult.
>>>>>>
>>>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>>>> The filter is still in the backing chain by that moment and has a
>>>>>> parent with child->perm != 0.
>>>>>>
>>>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>>>> similar to one in the block/mirror.c.
>>>>>>
>>>>>> How could we resolve the issue at the generic layer?
>>>>>>
>>>>>>
>>>>> The problem is that when we update permissions in 
>>>>> bdrv_replace_node, we
>>>>> consider new placement for "to" node, but old placement for "from" 
>>>>> node.
>>>>> So, during update, we may consider stricter permission 
>>>>> requirements for
>>>>> "from" than needed and they conflict with new "to" permissions.
>>>>>
>>>>> We should consider all graph changes for permission update
>>>>> simultaneously, in same transaction. For this, we need refactor
>>>>> permission update system to work on queue of nodes instead of 
>>>>> simple DFS
>>>>> recursion. And in the queue all the nodes to update should  be
>>>>> toplogically sorted. In this way, when we update node N, all its 
>>>>> parents
>>>>> are already updated. And of course, we should make no-perm graph 
>>>>> update
>>>>> before permission update, and rollback no-perm movement if permission
>>>>> update failed.
>>>> OK, you’ve convinced me, .active is good enough for now. O:)
>>>>
>>>>> And we need topological sort anyway. Consider the following example:
>>>>>
>>>>> A -
>>>>> |  \
>>>>> |  v
>>>>> |  B
>>>>> |  |
>>>>> v  /
>>>>> C<-
>>>>>
>>>>> A is parent for B and C, B is parent for C.
>>>>>
>>>>> Obviously, to update permissions, we should go in order A B C, so, 
>>>>> when
>>>>> we update C, all it's parents permission already updated. But with
>>>>> current approach (simple recursion) we can update in sequence A C 
>>>>> B C (C
>>>>> is updated twice). On first update of C, we consider old B 
>>>>> permissions,
>>>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>>>> will finish with correct graph. But if the wrong thing failed, we 
>>>>> break
>>>>> the whole process for no reason (it's possible that updated B 
>>>>> permission
>>>>> will be less strict, but we will never check it).
>>>> True.
>>>>
>>>>> I'll work on a patch for it.
>>>> Sounds great, though I fear for the complexity.  I’ll just wait and 
>>>> see
>>>> for now.
>>>>
>>> If you are OK with .active for now, then I think, Andrey can resend 
>>> with
>>> .active and I'll dig into permissions in parallel. If Andrey's series
>>> go first, I'll just drop .active later if my idea works.
>> Sure, that works for me.
>>
>> Max
>
>
> So, I am keeping the filter insert/remove functions in the COR-driver 
> code for now rather than moving them to the block generic layer, 
> aren't I?
>
> Andrey
>

As a concession, we can invoke .bdrv_insert/remove driver functions 
within the generic API ones.

Andrey



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/7] copy-on-read: add filter append/drop functions
  2020-09-24 17:40                 ` Andrey Shinkevich
@ 2020-09-24 17:49                   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 29+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-09-24 17:49 UTC (permalink / raw)
  To: Andrey Shinkevich, Max Reitz, qemu-block
  Cc: qemu-devel, kwolf, armbru, jsnow, eblake, den

24.09.2020 20:40, Andrey Shinkevich wrote:
> On 24.09.2020 20:29, Andrey Shinkevich wrote:
>> On 24.09.2020 18:00, Max Reitz wrote:
>>> On 24.09.20 16:51, Vladimir Sementsov-Ogievskiy wrote:
>>>> 24.09.2020 16:25, Max Reitz wrote:
>>>>> On 23.09.20 16:38, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 17.09.2020 19:09, Andrey Shinkevich wrote:
>>>>>>> On 04.09.2020 14:22, Max Reitz wrote:
>>>>>>>> On 28.08.20 18:52, Andrey Shinkevich wrote:
>>>>>>>>> Provide API for the COR-filter insertion/removal.
>>>>>>> ...
>>>>>>>>> Also, drop the filter child permissions for an inactive state when
>>>>>>>>> the
>>>>>>>>> filter node is being removed.
>>>>>>>> Do we need .active for that?  Shouldn’t it be sufficient to just not
>>>>>>>> require BLK_PERM_WRITE_UNCHANGED when no permissions are taken on the
>>>>>>>> node (i.e. perm == 0 in cor_child_perm())?
>>>>>>>>
>>>>>>>> Of course, using an .active field works, too.  But Vladimir wanted a
>>>>>>>> similar field in the preallocate filter, and there already is in
>>>>>>>> backup-top.  I feel like there shouldn’t be a need for this.
>>>>>>>>
>>>>>>>> .bdrv_child_perm() should generally be able to decide what permissions
>>>>>>>> it needs based on the information it gets.  If every driver needs to
>>>>>>>> keep track of whether it has any parents and feed that information
>>>>>>>> into
>>>>>>>> .bdrv_child_perm() as external state, then something feels wrong.
>>>>>>>>
>>>>>>>> If perm == 0 is not sufficient to rule out any parents, we should just
>>>>>>>> explicitly add a boolean that tells .bdrv_child_perm() whether
>>>>>>>> there are
>>>>>>>> any parents or not – shouldn’t be too difficult.
>>>>>>>
>>>>>>> The issue is that we fail in the bdrv_check_update_perm() with
>>>>>>> ""Conflicts with use by..." if the *nperm = 0 and the *nshared =
>>>>>>> BLK_PERM_ALL are not set before the call to the bdrv_replace_node().
>>>>>>> The filter is still in the backing chain by that moment and has a
>>>>>>> parent with child->perm != 0.
>>>>>>>
>>>>>>> The implementation of  the .bdrv_child_perm()in the given patch is
>>>>>>> similar to one in the block/mirror.c.
>>>>>>>
>>>>>>> How could we resolve the issue at the generic layer?
>>>>>>>
>>>>>>>
>>>>>> The problem is that when we update permissions in bdrv_replace_node, we
>>>>>> consider new placement for "to" node, but old placement for "from" node.
>>>>>> So, during update, we may consider stricter permission requirements for
>>>>>> "from" than needed and they conflict with new "to" permissions.
>>>>>>
>>>>>> We should consider all graph changes for permission update
>>>>>> simultaneously, in same transaction. For this, we need refactor
>>>>>> permission update system to work on queue of nodes instead of simple DFS
>>>>>> recursion. And in the queue all the nodes to update should  be
>>>>>> toplogically sorted. In this way, when we update node N, all its parents
>>>>>> are already updated. And of course, we should make no-perm graph update
>>>>>> before permission update, and rollback no-perm movement if permission
>>>>>> update failed.
>>>>> OK, you’ve convinced me, .active is good enough for now. O:)
>>>>>
>>>>>> And we need topological sort anyway. Consider the following example:
>>>>>>
>>>>>> A -
>>>>>> |  \
>>>>>> |  v
>>>>>> |  B
>>>>>> |  |
>>>>>> v  /
>>>>>> C<-
>>>>>>
>>>>>> A is parent for B and C, B is parent for C.
>>>>>>
>>>>>> Obviously, to update permissions, we should go in order A B C, so, when
>>>>>> we update C, all it's parents permission already updated. But with
>>>>>> current approach (simple recursion) we can update in sequence A C B C (C
>>>>>> is updated twice). On first update of C, we consider old B permissions,
>>>>>> so doing wrong thing. If it succeed, all is OK, on second C update we
>>>>>> will finish with correct graph. But if the wrong thing failed, we break
>>>>>> the whole process for no reason (it's possible that updated B permission
>>>>>> will be less strict, but we will never check it).
>>>>> True.
>>>>>
>>>>>> I'll work on a patch for it.
>>>>> Sounds great, though I fear for the complexity.  I’ll just wait and see
>>>>> for now.
>>>>>
>>>> If you are OK with .active for now, then I think, Andrey can resend with
>>>> .active and I'll dig into permissions in parallel. If Andrey's series
>>>> go first, I'll just drop .active later if my idea works.
>>> Sure, that works for me.
>>>
>>> Max
>>
>>
>> So, I am keeping the filter insert/remove functions in the COR-driver code for now rather than moving them to the block generic layer, aren't I?
>>
>> Andrey
>>
> 
> As a concession, we can invoke .bdrv_insert/remove driver functions within the generic API ones.
> 
> Andrey
> 

No, such handlers will not help I think. Until we improve permission-update system we can't implement good generic insertion code. So, I'd keep the patch as is for now.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-09-24 17:50 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28 16:52 [PATCH v8 0/7] Apply COR-filter to the block-stream permanently Andrey Shinkevich via
2020-08-28 16:52 ` [PATCH v8 1/7] copy-on-read: Support preadv/pwritev_part functions Andrey Shinkevich via
2020-08-28 16:52 ` [PATCH v8 2/7] copy-on-read: add filter append/drop functions Andrey Shinkevich via
2020-09-04 11:22   ` Max Reitz
2020-09-17 16:09     ` Andrey Shinkevich
2020-09-23 14:38       ` Vladimir Sementsov-Ogievskiy
2020-09-24 13:25         ` Max Reitz
2020-09-24 14:51           ` Vladimir Sementsov-Ogievskiy
2020-09-24 15:00             ` Max Reitz
2020-09-24 17:29               ` Andrey Shinkevich
2020-09-24 17:40                 ` Andrey Shinkevich
2020-09-24 17:49                   ` Vladimir Sementsov-Ogievskiy
2020-08-28 16:52 ` [PATCH v8 3/7] qapi: add filter-node-name to block-stream Andrey Shinkevich via
2020-08-28 16:52 ` [PATCH v8 4/7] copy-on-read: pass base file name to COR driver Andrey Shinkevich via
2020-09-04 12:17   ` Max Reitz
2020-09-04 12:26     ` Vladimir Sementsov-Ogievskiy
2020-08-28 16:52 ` [PATCH v8 5/7] copy-on-read: limit guest writes to base in " Andrey Shinkevich via
2020-09-04 12:50   ` Max Reitz
2020-09-04 13:59     ` Vladimir Sementsov-Ogievskiy
2020-09-22 13:13       ` Andrey Shinkevich
2020-09-24 11:18         ` Max Reitz
2020-08-28 16:52 ` [PATCH v8 6/7] block-stream: freeze link to base node during stream job Andrey Shinkevich via
2020-09-04 13:21   ` Max Reitz
2020-09-04 13:48     ` Vladimir Sementsov-Ogievskiy
2020-09-07 11:44       ` Max Reitz
2020-09-07 12:17         ` Vladimir Sementsov-Ogievskiy
2020-09-24 12:46           ` Max Reitz
2020-08-28 16:52 ` [PATCH v8 7/7] block: apply COR-filter to block-stream jobs Andrey Shinkevich via
2020-09-04 13:41   ` Max Reitz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.