All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup
@ 2018-12-29 12:20 Vladimir Sementsov-Ogievskiy
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap Vladimir Sementsov-Ogievskiy
                   ` (11 more replies)
  0 siblings, 12 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Hi all!

These series introduce backup-top driver. It's a filter-node, which
do copy-before-write operation. Mirror uses filter-node for handling
guest writes, let's move to filter-node (from write-notifiers) for
backup too

v5:

Now, based on John's bitmaps branch

patch "block: allow serialized reads to intersect" removed
03: reword comment (Kevin)
07: fix comments style
08: new
09: mostly rewritten
10: fix comment style
11: - fix comments style
    - fix uninitialized backup_top
    - fix return -> goto error if failed to
      append filter

v4:
fixes, rewrite driver to be implicit, drop new interfaces and
don't move to BdrvDirtyBitmap for now, as it's not obvious will
it be really needed and don't relate to these series more.

v3 was "[PATCH v3 00/18] fleecing-hook driver for backup"

v2 was "[RFC v2] new, node-graph-based fleecing and backup"

----

These series (v5) are based on John's bitmaps branch

Based-on: https://github.com/jnsnow/qemu bitmaps

Vladimir Sementsov-Ogievskiy (11):
  block/backup: simplify backup_incremental_init_copy_bitmap
  block/backup: move to copy_bitmap with granularity
  block: improve should_update_child
  iotests: handle -f argument correctly for qemu_io_silent
  iotests: allow resume_drive by node name
  iotests: prepare 055 to graph changes during backup job
  block: introduce backup-top filter driver
  block/io: refactor wait_serialising_requests
  block: add lock/unlock range functions
  block/backup: tiny refactor backup_job_create
  block/backup: use backup-top instead of write notifiers

 block/backup-top.h            |  43 ++++
 include/block/block_int.h     |   4 +
 block.c                       |  33 ++-
 block/backup-top.c            | 306 ++++++++++++++++++++++++
 block/backup.c                | 424 +++++++++++++++++-----------------
 block/io.c                    |  69 +++++-
 block/Makefile.objs           |   2 +
 tests/qemu-iotests/055        |  23 +-
 tests/qemu-iotests/iotests.py |  16 +-
 9 files changed, 682 insertions(+), 238 deletions(-)
 create mode 100644 block/backup-top.h
 create mode 100644 block/backup-top.c

-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-14 13:10   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Simplify backup_incremental_init_copy_bitmap using the function
bdrv_dirty_bitmap_next_dirty_area.

Note: move to job->len instead of bitmap size: it should not matter but
less code.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 40 ++++++++++++----------------------------
 1 file changed, 12 insertions(+), 28 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 435414e964..fbe7ce19e1 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
 /* init copy_bitmap from sync_bitmap */
 static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
 {
-    BdrvDirtyBitmapIter *dbi;
-    int64_t offset;
-    int64_t end = DIV_ROUND_UP(bdrv_dirty_bitmap_size(job->sync_bitmap),
-                               job->cluster_size);
-
-    dbi = bdrv_dirty_iter_new(job->sync_bitmap);
-    while ((offset = bdrv_dirty_iter_next(dbi)) != -1) {
-        int64_t cluster = offset / job->cluster_size;
-        int64_t next_cluster;
-
-        offset += bdrv_dirty_bitmap_granularity(job->sync_bitmap);
-        if (offset >= bdrv_dirty_bitmap_size(job->sync_bitmap)) {
-            hbitmap_set(job->copy_bitmap, cluster, end - cluster);
-            break;
-        }
+    uint64_t offset = 0;
+    uint64_t bytes = job->len;
 
-        offset = bdrv_dirty_bitmap_next_zero(job->sync_bitmap, offset,
-                                             UINT64_MAX);
-        if (offset == -1) {
-            hbitmap_set(job->copy_bitmap, cluster, end - cluster);
-            break;
-        }
+    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
+                                             &offset, &bytes))
+    {
+        uint64_t cluster = offset / job->cluster_size;
+        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
 
-        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
-        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
-        if (next_cluster >= end) {
+        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
+
+        offset = (last_cluster + 1) * job->cluster_size;
+        if (offset >= job->len) {
             break;
         }
-
-        bdrv_set_dirty_iter(dbi, next_cluster * job->cluster_size);
+        bytes = job->len - offset;
     }
 
     /* TODO job_progress_set_remaining() would make more sense */
     job_progress_update(&job->common.job,
         job->len - hbitmap_count(job->copy_bitmap) * job->cluster_size);
-
-    bdrv_dirty_iter_free(dbi);
 }
 
 static int coroutine_fn backup_run(Job *job, Error **errp)
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-14 14:10   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

We are going to share this bitmap between backup and backup-top filter
driver, so let's share something more meaningful. It also simplifies
some calculations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index fbe7ce19e1..0b3fddeb6c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -114,7 +114,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
     int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
-    hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);
+    assert(QEMU_IS_ALIGNED(start, job->cluster_size));
+    hbitmap_reset(job->copy_bitmap, start, job->cluster_size);
     nbytes = MIN(job->cluster_size, job->len - start);
     if (!*bounce_buffer) {
         *bounce_buffer = blk_blockalign(blk, job->cluster_size);
@@ -150,7 +151,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
 
     return nbytes;
 fail:
-    hbitmap_set(job->copy_bitmap, start / job->cluster_size, 1);
+    hbitmap_set(job->copy_bitmap, start, job->cluster_size);
     return ret;
 
 }
@@ -170,16 +171,15 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
+    assert(QEMU_IS_ALIGNED(start, job->cluster_size));
     nbytes = MIN(job->copy_range_size, end - start);
     nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
-    hbitmap_reset(job->copy_bitmap, start / job->cluster_size,
-                  nr_clusters);
+    hbitmap_reset(job->copy_bitmap, start, job->cluster_size * nr_clusters);
     ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
                             read_flags, write_flags);
     if (ret < 0) {
         trace_backup_do_cow_copy_range_fail(job, start, ret);
-        hbitmap_set(job->copy_bitmap, start / job->cluster_size,
-                    nr_clusters);
+        hbitmap_set(job->copy_bitmap, start, job->cluster_size * nr_clusters);
         return ret;
     }
 
@@ -207,7 +207,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     cow_request_begin(&cow_request, job, start, end);
 
     while (start < end) {
-        if (!hbitmap_get(job->copy_bitmap, start / job->cluster_size)) {
+        if (!hbitmap_get(job->copy_bitmap, start)) {
             trace_backup_do_cow_skip(job, start);
             start += job->cluster_size;
             continue; /* already copied */
@@ -303,6 +303,11 @@ static void backup_clean(Job *job)
     assert(s->target);
     blk_unref(s->target);
     s->target = NULL;
+
+    if (s->copy_bitmap) {
+        hbitmap_free(s->copy_bitmap);
+        s->copy_bitmap = NULL;
+    }
 }
 
 static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
@@ -315,7 +320,6 @@ static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
 void backup_do_checkpoint(BlockJob *job, Error **errp)
 {
     BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
-    int64_t len;
 
     assert(block_job_driver(job) == &backup_job_driver);
 
@@ -325,8 +329,7 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
         return;
     }
 
-    len = DIV_ROUND_UP(backup_job->len, backup_job->cluster_size);
-    hbitmap_set(backup_job->copy_bitmap, 0, len);
+    hbitmap_set(backup_job->copy_bitmap, 0, backup_job->len);
 }
 
 static void backup_drain(BlockJob *job)
@@ -381,16 +384,16 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
 {
     int ret;
     bool error_is_read;
-    int64_t cluster;
+    int64_t offset;
     HBitmapIter hbi;
 
     hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
-    while ((cluster = hbitmap_iter_next(&hbi)) != -1) {
+    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
         do {
             if (yield_and_check(job)) {
                 return 0;
             }
-            ret = backup_do_cow(job, cluster * job->cluster_size,
+            ret = backup_do_cow(job, offset,
                                 job->cluster_size, &error_is_read, false);
             if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
                            BLOCK_ERROR_ACTION_REPORT)
@@ -412,12 +415,9 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
     while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
                                              &offset, &bytes))
     {
-        uint64_t cluster = offset / job->cluster_size;
-        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
+        hbitmap_set(job->copy_bitmap, offset, bytes);
 
-        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
-
-        offset = (last_cluster + 1) * job->cluster_size;
+        offset += bytes;
         if (offset >= job->len) {
             break;
         }
@@ -426,30 +426,27 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
 
     /* TODO job_progress_set_remaining() would make more sense */
     job_progress_update(&job->common.job,
-        job->len - hbitmap_count(job->copy_bitmap) * job->cluster_size);
+        job->len - hbitmap_count(job->copy_bitmap));
 }
 
 static int coroutine_fn backup_run(Job *job, Error **errp)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
     BlockDriverState *bs = blk_bs(s->common.blk);
-    int64_t offset, nb_clusters;
+    int64_t offset;
     int ret = 0;
 
     QLIST_INIT(&s->inflight_reqs);
     qemu_co_rwlock_init(&s->flush_rwlock);
 
-    nb_clusters = DIV_ROUND_UP(s->len, s->cluster_size);
     job_progress_set_remaining(job, s->len);
 
-    s->copy_bitmap = hbitmap_alloc(nb_clusters, 0);
     if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         backup_incremental_init_copy_bitmap(s);
     } else {
-        hbitmap_set(s->copy_bitmap, 0, nb_clusters);
+        hbitmap_set(s->copy_bitmap, 0, s->len);
     }
 
-
     s->before_write.notify = backup_before_write_notify;
     bdrv_add_before_write_notifier(bs, &s->before_write);
 
@@ -530,7 +527,6 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
     /* wait until pending backup_do_cow() calls have completed */
     qemu_co_rwlock_wrlock(&s->flush_rwlock);
     qemu_co_rwlock_unlock(&s->flush_rwlock);
-    hbitmap_free(s->copy_bitmap);
 
     return ret;
 }
@@ -681,6 +677,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     } else {
         job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
     }
+
+    job->copy_bitmap = hbitmap_alloc(len, ctz32(job->cluster_size));
     job->use_copy_range = true;
     job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
                                         blk_get_max_transfer(job->target));
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap Vladimir Sementsov-Ogievskiy
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-14 14:32   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

As it already said in the comment, we don't want to create loops in
parent->child relations. So, when we try to append @to to @c, we should
check that @c is not in @to children subtree, and we should check it
recursively, not only the first level. The patch provides BFS-based
search, to check the relations.

This is needed for further fleecing-hook filter usage: we need to
append it to source, when the hook is already a parent of target, and
source may be in a backing chain of target (fleecing-scheme). So, on
appending, the hook should not became a child (direct or through
children subtree) of the target.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 4f5ff2cc12..8f04f293da 100644
--- a/block.c
+++ b/block.c
@@ -3536,7 +3536,7 @@ void bdrv_close_all(void)
 
 static bool should_update_child(BdrvChild *c, BlockDriverState *to)
 {
-    BdrvChild *to_c;
+    GList *queue = NULL, *pos;
 
     if (c->role->stay_at_node) {
         return false;
@@ -3572,13 +3572,36 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
      * if A is a child of B, that means we cannot replace A by B there
      * because that would create a loop.  Silently detaching A from B
      * is also not really an option.  So overall just leaving A in
-     * place there is the most sensible choice. */
-    QLIST_FOREACH(to_c, &to->children, next) {
-        if (to_c == c) {
-            return false;
+     * place there is the most sensible choice.
+     *
+     * We would also create a loop in any cases where @c is only
+     * indirectly referenced by @to. Prevent this by returning false
+     * if @c is found (by breadth-first search) anywhere in the whole
+     * subtree of @to.
+     */
+
+    pos = queue = g_list_append(queue, to);
+    while (pos) {
+        BlockDriverState *v = pos->data;
+        BdrvChild *c2;
+
+        QLIST_FOREACH(c2, &v->children, next) {
+            if (c2 == c) {
+                g_list_free(queue);
+                return false;
+            }
+
+            if (g_list_find(queue, c2->bs)) {
+                continue;
+            }
+
+            queue = g_list_append(queue, c2->bs);
         }
+
+        pos = pos->next;
     }
 
+    g_list_free(queue);
     return true;
 }
 
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-14 14:36   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Correctly rewrite default argument. After the patch, the function can
be used for other (not only default test-chosen) image format.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 9595429fea..76877ad584 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -139,7 +139,12 @@ def qemu_io(*args):
 
 def qemu_io_silent(*args):
     '''Run qemu-io and return the exit code, suppressing stdout'''
-    args = qemu_io_args + list(args)
+    if '-f' in qemu_io_args and '-f' in args:
+        ind = qemu_io_args.index('-f')
+        args = qemu_io_args[:ind] + qemu_io_args[ind+2:] + list(args)
+    else:
+        args = qemu_io_args + list(args)
+
     exitcode = subprocess.call(args, stdout=open('/dev/null', 'w'))
     if exitcode < 0:
         sys.stderr.write('qemu-io received signal %i: %s\n' %
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-14 14:46   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

After node graph changes, we may not be able to resume_drive by device
name (backing files are not recursively searched). So, lets allow to
resume by node-name. Set constant name for breakpoints, to avoid
introducing extra parameters.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/iotests.py | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 76877ad584..c9779f432f 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -415,11 +415,11 @@ class VM(qtest.QEMUQtestMachine):
             self.pause_drive(drive, "write_aio")
             return
         self.qmp('human-monitor-command',
-                    command_line='qemu-io %s "break %s bp_%s"' % (drive, event, drive))
+                    command_line='qemu-io %s "break %s bp_0"' % (drive, event))
 
     def resume_drive(self, drive):
         self.qmp('human-monitor-command',
-                    command_line='qemu-io %s "remove_break bp_%s"' % (drive, drive))
+                    command_line='qemu-io %s "remove_break bp_0"' % (drive))
 
     def hmp_qemu_io(self, drive, cmd):
         '''Write to a given drive using an HMP command'''
@@ -543,13 +543,14 @@ class QMPTestCase(unittest.TestCase):
         self.assertEqual(self.vm.flatten_qmp_object(json.loads(json_filename[5:])),
                          self.vm.flatten_qmp_object(reference))
 
-    def cancel_and_wait(self, drive='drive0', force=False, resume=False):
+    def cancel_and_wait(self, drive='drive0', force=False, resume=False,
+                        resume_node=None):
         '''Cancel a block job and wait for it to finish, returning the event'''
         result = self.vm.qmp('block-job-cancel', device=drive, force=force)
         self.assert_qmp(result, 'return', {})
 
         if resume:
-            self.vm.resume_drive(drive)
+            self.vm.resume_drive(resume_node or drive)
 
         cancelled = False
         result = None
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 13:48   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Backup will append fleecing-hook node above source node, so, we can't
resume by device name (because resume don't search recursively through
backing chain).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 tests/qemu-iotests/055 | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index 3437c11507..be5451e1c5 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -48,7 +48,8 @@ class TestSingleDrive(iotests.QMPTestCase):
     def setUp(self):
         qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, str(image_len))
 
-        self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
+        self.vm = iotests.VM().add_drive('blkdebug::' + test_img,
+                                         'node-name=source')
         self.vm.add_drive(blockdev_target_img, interface="none")
         if iotests.qemu_default_machine == 'pc':
             self.vm.add_drive(None, 'media=cdrom', 'ide')
@@ -69,7 +70,7 @@ class TestSingleDrive(iotests.QMPTestCase):
         result = self.vm.qmp(cmd, device='drive0', target=target, sync='full')
         self.assert_qmp(result, 'return', {})
 
-        event = self.cancel_and_wait(resume=True)
+        event = self.cancel_and_wait(resume=True, resume_node='source')
         self.assert_qmp(event, 'data/type', 'backup')
 
     def test_cancel_drive_backup(self):
@@ -87,7 +88,7 @@ class TestSingleDrive(iotests.QMPTestCase):
         self.assert_qmp(result, 'return', {})
 
         self.pause_job('drive0', wait=False)
-        self.vm.resume_drive('drive0')
+        self.vm.resume_drive('source')
         self.pause_wait('drive0')
 
         result = self.vm.qmp('query-block-jobs')
@@ -165,7 +166,8 @@ class TestSetSpeed(iotests.QMPTestCase):
     def setUp(self):
         qemu_img('create', '-f', iotests.imgfmt, blockdev_target_img, str(image_len))
 
-        self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
+        self.vm = iotests.VM().add_drive('blkdebug::' + test_img,
+                                         'node-name=source')
         self.vm.add_drive(blockdev_target_img, interface="none")
         self.vm.launch()
 
@@ -197,7 +199,7 @@ class TestSetSpeed(iotests.QMPTestCase):
         self.assert_qmp(result, 'return[0]/device', 'drive0')
         self.assert_qmp(result, 'return[0]/speed', 8 * 1024 * 1024)
 
-        event = self.cancel_and_wait(resume=True)
+        event = self.cancel_and_wait(resume=True, resume_node='source')
         self.assert_qmp(event, 'data/type', 'backup')
 
         # Check setting speed option works
@@ -210,7 +212,7 @@ class TestSetSpeed(iotests.QMPTestCase):
         self.assert_qmp(result, 'return[0]/device', 'drive0')
         self.assert_qmp(result, 'return[0]/speed', 4 * 1024 * 1024)
 
-        event = self.cancel_and_wait(resume=True)
+        event = self.cancel_and_wait(resume=True, resume_node='source')
         self.assert_qmp(event, 'data/type', 'backup')
 
     def test_set_speed_drive_backup(self):
@@ -236,7 +238,7 @@ class TestSetSpeed(iotests.QMPTestCase):
         result = self.vm.qmp('block-job-set-speed', device='drive0', speed=-1)
         self.assert_qmp(result, 'error/class', 'GenericError')
 
-        event = self.cancel_and_wait(resume=True)
+        event = self.cancel_and_wait(resume=True, resume_node='source')
         self.assert_qmp(event, 'data/type', 'backup')
 
     def test_set_speed_invalid_drive_backup(self):
@@ -464,7 +466,8 @@ class TestDriveCompression(iotests.QMPTestCase):
             pass
 
     def do_prepare_drives(self, fmt, args, attach_target):
-        self.vm = iotests.VM().add_drive('blkdebug::' + test_img)
+        self.vm = iotests.VM().add_drive('blkdebug::' + test_img,
+                                         'node-name=source')
 
         qemu_img('create', '-f', fmt, blockdev_target_img,
                  str(TestDriveCompression.image_len), *args)
@@ -507,7 +510,7 @@ class TestDriveCompression(iotests.QMPTestCase):
         result = self.vm.qmp(cmd, device='drive0', sync='full', compress=True, **args)
         self.assert_qmp(result, 'return', {})
 
-        event = self.cancel_and_wait(resume=True)
+        event = self.cancel_and_wait(resume=True, resume_node='source')
         self.assert_qmp(event, 'data/type', 'backup')
 
         self.vm.shutdown()
@@ -532,7 +535,7 @@ class TestDriveCompression(iotests.QMPTestCase):
         self.assert_qmp(result, 'return', {})
 
         self.pause_job('drive0', wait=False)
-        self.vm.resume_drive('drive0')
+        self.vm.resume_drive('source')
         self.pause_wait('drive0')
 
         result = self.vm.qmp('query-block-jobs')
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 16:02   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Backup-top filter does copy-before-write operation. It should be
inserted above active disk and has a target node for CBW, like the
following:

    +-------+
    | Guest |
    +---+---+
        |r,w
        v
    +---+-----------+  target   +---------------+
    | backup_top    |---------->| target(qcow2) |
    +---+-----------+   CBW     +---+-----------+
        |
backing |r,w
        v
    +---+---------+
    | Active disk |
    +-------------+

The driver will be used in backup instead of write-notifiers.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup-top.h  |  43 +++++++
 block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
 block/Makefile.objs |   2 +
 3 files changed, 351 insertions(+)
 create mode 100644 block/backup-top.h
 create mode 100644 block/backup-top.c

diff --git a/block/backup-top.h b/block/backup-top.h
new file mode 100644
index 0000000000..abe4dda574
--- /dev/null
+++ b/block/backup-top.h
@@ -0,0 +1,43 @@
+/*
+ * backup-top filter driver
+ *
+ * The driver performs Copy-Before-Write (CBW) operation: it is injected above
+ * some node, and before each write it copies _old_ data to the target node.
+ *
+ * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "block/block_int.h"
+
+typedef struct BDRVBackupTopState {
+    HBitmap *copy_bitmap; /* what should be copied to @target on guest write. */
+    BdrvChild *target;
+
+    uint64_t bytes_copied;
+} BDRVBackupTopState;
+
+void bdrv_backup_top_drop(BlockDriverState *bs);
+uint64_t bdrv_backup_top_progress(BlockDriverState *bs);
+
+BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
+                                         BlockDriverState *target,
+                                         HBitmap *copy_bitmap,
+                                         Error **errp);
diff --git a/block/backup-top.c b/block/backup-top.c
new file mode 100644
index 0000000000..0e7b3e3142
--- /dev/null
+++ b/block/backup-top.c
@@ -0,0 +1,306 @@
+/*
+ * backup-top filter driver
+ *
+ * The driver performs Copy-Before-Write (CBW) operation: it is injected above
+ * some node, and before each write it copies _old_ data to the target node.
+ *
+ * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/cutils.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/qdict.h"
+
+#include "block/backup-top.h"
+
+static coroutine_fn int backup_top_co_preadv(
+        BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+        QEMUIOVector *qiov, int flags)
+{
+    /*
+     * Features to be implemented:
+     * F1. COR. save read data to fleecing target for fast access
+     *     (to reduce reads). This possibly may be done with use of copy-on-read
+     *     filter, but we need an ability to make COR requests optional: for
+     *     example, if target is a ram-cache, and if it is full now, we should
+     *     skip doing COR request, as it is actually not necessary.
+     *
+     * F2. Feature for guest: read from fleecing target if data is in ram-cache
+     *     and is unchanged
+     */
+
+    return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
+}
+
+static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
+                                          uint64_t bytes)
+{
+    int ret = 0;
+    BDRVBackupTopState *s = bs->opaque;
+    uint64_t gran = 1UL << hbitmap_granularity(s->copy_bitmap);
+    uint64_t end = QEMU_ALIGN_UP(offset + bytes, gran);
+    uint64_t off = QEMU_ALIGN_DOWN(offset, gran), len;
+    size_t align = MAX(bdrv_opt_mem_align(bs->backing->bs),
+                       bdrv_opt_mem_align(s->target->bs));
+    struct iovec iov = {
+        .iov_base = qemu_memalign(align, end - off),
+        .iov_len = end - off
+    };
+    QEMUIOVector qiov;
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /*
+     * Features to be implemented:
+     * F3. parallelize copying loop
+     * F4. detect zeros
+     * F5. use block_status ?
+     * F6. don't copy clusters which are already cached by COR [see F1]
+     * F7. if target is ram-cache and it is full, there should be a possibility
+     *     to drop not necessary data (cached by COR [see F1]) to handle CBW
+     *     fast.
+     */
+
+    len = end - off;
+    while (hbitmap_next_dirty_area(s->copy_bitmap, &off, &len)) {
+        iov.iov_len = qiov.size = len;
+
+        hbitmap_reset(s->copy_bitmap, off, len);
+
+        ret = bdrv_co_preadv(bs->backing, off, len, &qiov,
+                             BDRV_REQ_NO_SERIALISING);
+        if (ret < 0) {
+            hbitmap_set(s->copy_bitmap, off, len);
+            goto finish;
+        }
+
+        ret = bdrv_co_pwritev(s->target, off, len, &qiov, BDRV_REQ_SERIALISING);
+        if (ret < 0) {
+            hbitmap_set(s->copy_bitmap, off, len);
+            goto finish;
+        }
+
+        s->bytes_copied += len;
+        off += len;
+        if (off >= end) {
+            break;
+        }
+        len = end - off;
+    }
+
+finish:
+    qemu_vfree(iov.iov_base);
+
+    /*
+     * F8. we fail guest request in case of error. We can alter it by
+     * possibility to fail copying process instead, or retry several times, or
+     * may be guest pause, etc.
+     */
+    return ret;
+}
+
+static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
+                                                  int64_t offset, int bytes)
+{
+    int ret = backup_top_cbw(bs, offset, bytes);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /*
+     * Features to be implemented:
+     * F9. possibility of lazy discard: just defer the discard after fleecing
+     *     completion. If write (or new discard) occurs to the same area, just
+     *     drop deferred discard.
+     */
+
+    return bdrv_co_pdiscard(bs->backing, offset, bytes);
+}
+
+static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs,
+        int64_t offset, int bytes, BdrvRequestFlags flags)
+{
+    int ret = backup_top_cbw(bs, offset, bytes);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return bdrv_co_pwrite_zeroes(bs->backing, offset, bytes, flags);
+}
+
+static coroutine_fn int backup_top_co_pwritev(BlockDriverState *bs,
+                                                 uint64_t offset,
+                                                 uint64_t bytes,
+                                                 QEMUIOVector *qiov, int flags)
+{
+    int ret = backup_top_cbw(bs, offset, bytes);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
+{
+    if (!bs->backing) {
+        return 0;
+    }
+
+    return bdrv_co_flush(bs->backing->bs);
+}
+
+static void backup_top_refresh_filename(BlockDriverState *bs, QDict *opts)
+{
+    if (bs->backing == NULL) {
+        /*
+         * we can be here after failed bdrv_attach_child in
+         * bdrv_set_backing_hd
+         */
+        return;
+    }
+    bdrv_refresh_filename(bs->backing->bs);
+    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
+            bs->backing->bs->filename);
+}
+
+static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
+                                       const BdrvChildRole *role,
+                                       BlockReopenQueue *reopen_queue,
+                                       uint64_t perm, uint64_t shared,
+                                       uint64_t *nperm, uint64_t *nshared)
+{
+    bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, nperm,
+                              nshared);
+
+    if (role == &child_file) {
+        /*
+         * share write to target, to not interfere guest writes to it's disk
+         * which will be in target backing chain
+         */
+        *nshared = *nshared | BLK_PERM_WRITE;
+        *nperm = *nperm | BLK_PERM_WRITE;
+    } else {
+        *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+    }
+}
+
+BlockDriver bdrv_backup_top_filter = {
+    .format_name = "backup-top",
+    .instance_size = sizeof(BDRVBackupTopState),
+
+    .bdrv_co_preadv             = backup_top_co_preadv,
+    .bdrv_co_pwritev            = backup_top_co_pwritev,
+    .bdrv_co_pwrite_zeroes      = backup_top_co_pwrite_zeroes,
+    .bdrv_co_pdiscard           = backup_top_co_pdiscard,
+    .bdrv_co_flush              = backup_top_co_flush,
+
+    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
+
+    .bdrv_refresh_filename      = backup_top_refresh_filename,
+
+    .bdrv_child_perm            = backup_top_child_perm,
+
+    .is_filter = true,
+};
+
+BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
+                                         BlockDriverState *target,
+                                         HBitmap *copy_bitmap,
+                                         Error **errp)
+{
+    Error *local_err = NULL;
+    BDRVBackupTopState *state;
+    BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter,
+                                                 NULL, BDRV_O_RDWR, errp);
+
+    if (!top) {
+        return NULL;
+    }
+
+    top->implicit = true;
+    top->total_sectors = source->total_sectors;
+    top->opaque = state = g_new0(BDRVBackupTopState, 1);
+    state->copy_bitmap = copy_bitmap;
+
+    bdrv_ref(target);
+    state->target = bdrv_attach_child(top, target, "target", &child_file, errp);
+    if (!state->target) {
+        bdrv_unref(target);
+        bdrv_unref(top);
+        return NULL;
+    }
+
+    bdrv_set_aio_context(top, bdrv_get_aio_context(source));
+    bdrv_set_aio_context(target, bdrv_get_aio_context(source));
+
+    bdrv_drained_begin(source);
+
+    bdrv_ref(top);
+    bdrv_append(top, source, &local_err);
+
+    if (local_err) {
+        bdrv_unref(top);
+    }
+
+    bdrv_drained_end(source);
+
+    if (local_err) {
+        bdrv_unref_child(top, state->target);
+        bdrv_unref(top);
+        error_propagate(errp, local_err);
+        return NULL;
+    }
+
+    return top;
+}
+
+void bdrv_backup_top_drop(BlockDriverState *bs)
+{
+    BDRVBackupTopState *s = bs->opaque;
+
+    AioContext *aio_context = bdrv_get_aio_context(bs);
+
+    aio_context_acquire(aio_context);
+
+    bdrv_drained_begin(bs);
+
+    bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, &error_abort);
+    bdrv_replace_node(bs, backing_bs(bs), &error_abort);
+    bdrv_set_backing_hd(bs, NULL, &error_abort);
+
+    bdrv_drained_end(bs);
+
+    if (s->target) {
+        bdrv_unref_child(bs, s->target);
+    }
+    bdrv_unref(bs);
+
+    aio_context_release(aio_context);
+}
+
+uint64_t bdrv_backup_top_progress(BlockDriverState *bs)
+{
+    BDRVBackupTopState *s = bs->opaque;
+
+    return s->bytes_copied;
+}
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 7a81892a52..9072115c09 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -40,6 +40,8 @@ block-obj-y += throttle.o copy-on-read.o
 
 block-obj-y += crypto.o
 
+block-obj-y += backup-top.o
+
 common-obj-y += stream.o
 
 nfs.o-libs         := $(LIBNFS_LIBS)
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 16:18   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Split out do_wait_serialising_requests with additional possibility to
not actually wait but just check, that there is something to wait for.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/io.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index bd9d688f8b..b87c11a6ec 100644
--- a/block/io.c
+++ b/block/io.c
@@ -720,12 +720,13 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
     bdrv_wakeup(bs);
 }
 
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn do_wait_serialising_requests(BdrvTrackedRequest *self,
+                                                      bool wait)
 {
     BlockDriverState *bs = self->bs;
     BdrvTrackedRequest *req;
     bool retry;
-    bool waited = false;
+    bool found = false;
 
     if (!atomic_read(&bs->serialising_in_flight)) {
         return false;
@@ -751,11 +752,13 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
                  * will wait for us as soon as it wakes up, then just go on
                  * (instead of producing a deadlock in the former case). */
                 if (!req->waiting_for) {
-                    self->waiting_for = req;
-                    qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
-                    self->waiting_for = NULL;
-                    retry = true;
-                    waited = true;
+                    found = true;
+                    if (wait) {
+                        self->waiting_for = req;
+                        qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
+                        self->waiting_for = NULL;
+                        retry = true;
+                    }
                     break;
                 }
             }
@@ -763,7 +766,12 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
         qemu_co_mutex_unlock(&bs->reqs_lock);
     } while (retry);
 
-    return waited;
+    return found;
+}
+
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+{
+    return do_wait_serialising_requests(self, true);
 }
 
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-16 16:36   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

From: Vladimir Sementsov-Ogievskiy <etendren@gmail.com>

Introduce lock/unlock range functionality, based on serialized
requests. This is needed to refactor backup, dropping local
tracked-request-like synchronization.

Signed-off-by: Vladimir Sementsov-Ogievskiy <etendren@gmail.com>
---
 include/block/block_int.h |  4 ++++
 block/io.c                | 45 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index f605622216..3475ee5360 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -64,6 +64,7 @@ enum BdrvTrackedRequestType {
     BDRV_TRACKED_WRITE,
     BDRV_TRACKED_DISCARD,
     BDRV_TRACKED_TRUNCATE,
+    BDRV_TRACKED_LOCK,
 };
 
 typedef struct BdrvTrackedRequest {
@@ -856,6 +857,9 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
     int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags);
+void *coroutine_fn bdrv_co_try_lock(BdrvChild *child,
+                                    int64_t offset, unsigned int bytes);
+void coroutine_fn bdrv_co_unlock(void *opaque);
 
 extern unsigned int bdrv_drain_all_count;
 void bdrv_apply_subtree_drain(BdrvChild *child, BlockDriverState *new_parent);
diff --git a/block/io.c b/block/io.c
index b87c11a6ec..38a63d16f1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -720,6 +720,15 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
     bdrv_wakeup(bs);
 }
 
+static bool ignore_intersection(BdrvTrackedRequest *a, BdrvTrackedRequest *b)
+{
+    return a == b || (!a->serialising && !b->serialising) ||
+        (a->type == BDRV_TRACKED_LOCK && b->type == BDRV_TRACKED_READ &&
+         !b->serialising) ||
+        (b->type == BDRV_TRACKED_LOCK && a->type == BDRV_TRACKED_READ &&
+         !a->serialising);
+}
+
 static bool coroutine_fn do_wait_serialising_requests(BdrvTrackedRequest *self,
                                                       bool wait)
 {
@@ -736,7 +745,7 @@ static bool coroutine_fn do_wait_serialising_requests(BdrvTrackedRequest *self,
         retry = false;
         qemu_co_mutex_lock(&bs->reqs_lock);
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
-            if (req == self || (!req->serialising && !self->serialising)) {
+            if (ignore_intersection(self, req)) {
                 continue;
             }
             if (tracked_request_overlaps(req, self->overlap_offset,
@@ -774,6 +783,12 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
     return do_wait_serialising_requests(self, true);
 }
 
+static bool coroutine_fn should_wait_serialising_requests(
+        BdrvTrackedRequest *self)
+{
+    return do_wait_serialising_requests(self, false);
+}
+
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
                                    size_t size)
 {
@@ -3250,3 +3265,31 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, PreallocMode prealloc,
 
     return tco.ret;
 }
+
+void *coroutine_fn bdrv_co_try_lock(BdrvChild *child,
+                                    int64_t offset, unsigned int bytes)
+{
+    BlockDriverState *bs = child->bs;
+    BdrvTrackedRequest *req = g_new(BdrvTrackedRequest, 1);
+
+    tracked_request_begin(req, bs, offset, bytes, BDRV_TRACKED_LOCK);
+    mark_request_serialising(req, bdrv_get_cluster_size(bs));
+
+    if (should_wait_serialising_requests(req)) {
+        tracked_request_end(req);
+        g_free(req);
+        return NULL;
+    }
+
+    return req;
+}
+
+void coroutine_fn bdrv_co_unlock(void *opaque)
+{
+    BdrvTrackedRequest *req = opaque;
+
+    assert(req->type == BDRV_TRACKED_LOCK);
+
+    tracked_request_end(req);
+    g_free(req);
+}
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-18 13:00   ` Max Reitz
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers Vladimir Sementsov-Ogievskiy
  2019-01-23 15:26 ` [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup no-reply
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Move copy-bitmap find/create code. It's needed for the following
commit, as we'll need copy_bitmap before actual block job creation. Do
it in a separate commit to simplify review.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 69 +++++++++++++++++++++++++++++---------------------
 1 file changed, 40 insertions(+), 29 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 0b3fddeb6c..88c0242b4e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -561,6 +561,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     BlockDriverInfo bdi;
     BackupBlockJob *job = NULL;
     int ret;
+    int64_t cluster_size;
+    HBitmap *copy_bitmap = NULL;
 
     assert(bs);
     assert(target);
@@ -615,6 +617,35 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         return NULL;
     }
 
+    /*
+     * If there is no backing file on the target, we cannot rely on COW if our
+     * backup cluster size is smaller than the target cluster size. Even for
+     * targets with a backing file, try to avoid COW if possible.
+     */
+    ret = bdrv_get_info(target, &bdi);
+    if (ret == -ENOTSUP && !target->backing) {
+        /* Cluster size is not defined */
+        warn_report("The target block device doesn't provide "
+                    "information about the block size and it doesn't have a "
+                    "backing file. The default block size of %u bytes is "
+                    "used. If the actual block size of the target exceeds "
+                    "this default, the backup may be unusable",
+                    BACKUP_CLUSTER_SIZE_DEFAULT);
+        cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
+    } else if (ret < 0 && !target->backing) {
+        error_setg_errno(errp, -ret,
+            "Couldn't determine the cluster size of the target image, "
+            "which has no backing file");
+        error_append_hint(errp,
+            "Aborting, since this may create an unusable destination image\n");
+        return NULL;
+    } else if (ret < 0 && target->backing) {
+        /* Not fatal; just trudge on ahead. */
+        cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
+    } else {
+        cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
+    }
+
     len = bdrv_getlength(bs);
     if (len < 0) {
         error_setg_errno(errp, -len, "unable to get length for '%s'",
@@ -622,6 +653,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         goto error;
     }
 
+    copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
+
     /* job->len is fixed, so we can't allow resize */
     job = block_job_create(job_id, &backup_job_driver, txn, bs,
                            BLK_PERM_CONSISTENT_READ,
@@ -650,35 +683,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
 
     /* Detect image-fleecing (and similar) schemes */
     job->serialize_target_writes = bdrv_chain_contains(target, bs);
-
-    /* If there is no backing file on the target, we cannot rely on COW if our
-     * backup cluster size is smaller than the target cluster size. Even for
-     * targets with a backing file, try to avoid COW if possible. */
-    ret = bdrv_get_info(target, &bdi);
-    if (ret == -ENOTSUP && !target->backing) {
-        /* Cluster size is not defined */
-        warn_report("The target block device doesn't provide "
-                    "information about the block size and it doesn't have a "
-                    "backing file. The default block size of %u bytes is "
-                    "used. If the actual block size of the target exceeds "
-                    "this default, the backup may be unusable",
-                    BACKUP_CLUSTER_SIZE_DEFAULT);
-        job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
-    } else if (ret < 0 && !target->backing) {
-        error_setg_errno(errp, -ret,
-            "Couldn't determine the cluster size of the target image, "
-            "which has no backing file");
-        error_append_hint(errp,
-            "Aborting, since this may create an unusable destination image\n");
-        goto error;
-    } else if (ret < 0 && target->backing) {
-        /* Not fatal; just trudge on ahead. */
-        job->cluster_size = BACKUP_CLUSTER_SIZE_DEFAULT;
-    } else {
-        job->cluster_size = MAX(BACKUP_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
-    }
-
-    job->copy_bitmap = hbitmap_alloc(len, ctz32(job->cluster_size));
+    job->cluster_size = cluster_size;
+    job->copy_bitmap = copy_bitmap;
+    copy_bitmap = NULL;
     job->use_copy_range = true;
     job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
                                         blk_get_max_transfer(job->target));
@@ -694,6 +701,10 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     return &job->common;
 
  error:
+    if (copy_bitmap) {
+        assert(!job || !job->copy_bitmap);
+        hbitmap_free(copy_bitmap);
+    }
     if (sync_bitmap) {
         bdrv_reclaim_dirty_bitmap(bs, sync_bitmap, NULL);
     }
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create Vladimir Sementsov-Ogievskiy
@ 2018-12-29 12:20 ` Vladimir Sementsov-Ogievskiy
  2019-01-18 14:56   ` Max Reitz
  2019-01-23 15:26 ` [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup no-reply
  11 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2018-12-29 12:20 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, mreitz, kwolf, vsementsov, den, eblake, jsnow

Drop write notifiers and use filter node instead. Changes:

1. copy-before-writes now handled by filter node, so, drop all
   is_write_notifier arguments.

2. we don't have intersecting requests, so their handling is dropped.
Instead, synchronization works as follows:
when backup or backup-top starts copying of some area it firstly
clears copy-bitmap bits, and nobody touches areas, not marked with
dirty bits in copy-bitmap, so there is no intersection. Also, backup
job copy operations are surrounded by bdrv region lock, which is
actually serializing request, to not interfer with guest writes and
not read changed data from source (before reading we clear
corresponding bit in copy-bitmap, so, this area is not more handled by
backup-top).

3. To sync with in-flight requests we now just drain hook node, we
don't need rw-lock.

4. After the whole backup loop (top, full, incremental modes), we need
to check for not copied clusters, which were under backup-top operation
and we skipped them, but backup-top operation failed, error returned to
the guest and dirty bits set back.

5. Don't create additional blk, use backup-top children for copy
operations.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
 1 file changed, 149 insertions(+), 136 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 88c0242b4e..e332909fb7 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -26,105 +26,61 @@
 #include "qemu/bitmap.h"
 #include "qemu/error-report.h"
 
-#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
+#include "block/backup-top.h"
 
-typedef struct CowRequest {
-    int64_t start_byte;
-    int64_t end_byte;
-    QLIST_ENTRY(CowRequest) list;
-    CoQueue wait_queue; /* coroutines blocked on this request */
-} CowRequest;
+#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
 
 typedef struct BackupBlockJob {
     BlockJob common;
-    BlockBackend *target;
+    BdrvChild *source;
+    BdrvChild *target;
     /* bitmap for sync=incremental */
     BdrvDirtyBitmap *sync_bitmap;
     MirrorSyncMode sync_mode;
     BlockdevOnError on_source_error;
     BlockdevOnError on_target_error;
-    CoRwlock flush_rwlock;
     uint64_t len;
     uint64_t bytes_read;
     int64_t cluster_size;
     bool compress;
-    NotifierWithReturn before_write;
-    QLIST_HEAD(, CowRequest) inflight_reqs;
 
     HBitmap *copy_bitmap;
     bool use_copy_range;
     int64_t copy_range_size;
 
     bool serialize_target_writes;
+
+    BlockDriverState *backup_top;
+    uint64_t backup_top_progress;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
 
-/* See if in-flight requests overlap and wait for them to complete */
-static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
-                                                       int64_t start,
-                                                       int64_t end)
-{
-    CowRequest *req;
-    bool retry;
-
-    do {
-        retry = false;
-        QLIST_FOREACH(req, &job->inflight_reqs, list) {
-            if (end > req->start_byte && start < req->end_byte) {
-                qemu_co_queue_wait(&req->wait_queue, NULL);
-                retry = true;
-                break;
-            }
-        }
-    } while (retry);
-}
-
-/* Keep track of an in-flight request */
-static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
-                              int64_t start, int64_t end)
-{
-    req->start_byte = start;
-    req->end_byte = end;
-    qemu_co_queue_init(&req->wait_queue);
-    QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
-}
-
-/* Forget about a completed request */
-static void cow_request_end(CowRequest *req)
-{
-    QLIST_REMOVE(req, list);
-    qemu_co_queue_restart_all(&req->wait_queue);
-}
-
 /* Copy range to target with a bounce buffer and return the bytes copied. If
  * error occurred, return a negative error number */
 static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
                                                       int64_t start,
                                                       int64_t end,
-                                                      bool is_write_notifier,
                                                       bool *error_is_read,
                                                       void **bounce_buffer)
 {
     int ret;
     struct iovec iov;
     QEMUIOVector qiov;
-    BlockBackend *blk = job->common.blk;
     int nbytes;
-    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
     int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     assert(QEMU_IS_ALIGNED(start, job->cluster_size));
     hbitmap_reset(job->copy_bitmap, start, job->cluster_size);
     nbytes = MIN(job->cluster_size, job->len - start);
     if (!*bounce_buffer) {
-        *bounce_buffer = blk_blockalign(blk, job->cluster_size);
+        *bounce_buffer = qemu_blockalign(job->source->bs, job->cluster_size);
     }
     iov.iov_base = *bounce_buffer;
     iov.iov_len = nbytes;
     qemu_iovec_init_external(&qiov, &iov, 1);
 
-    ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags);
+    ret = bdrv_co_preadv(job->source, start, qiov.size, &qiov, 0);
     if (ret < 0) {
         trace_backup_do_cow_read_fail(job, start, ret);
         if (error_is_read) {
@@ -134,12 +90,12 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
     }
 
     if (qemu_iovec_is_zero(&qiov)) {
-        ret = blk_co_pwrite_zeroes(job->target, start,
-                                   qiov.size, write_flags | BDRV_REQ_MAY_UNMAP);
+        ret = bdrv_co_pwrite_zeroes(job->target, start, qiov.size,
+                                    write_flags | BDRV_REQ_MAY_UNMAP);
     } else {
-        ret = blk_co_pwritev(job->target, start,
-                             qiov.size, &qiov, write_flags |
-                             (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
+        ret = bdrv_co_pwritev(job->target, start,
+                              qiov.size, &qiov, write_flags |
+                              (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
     }
     if (ret < 0) {
         trace_backup_do_cow_write_fail(job, start, ret);
@@ -159,15 +115,11 @@ fail:
 /* Copy range to target and return the bytes copied. If error occurred, return a
  * negative error number. */
 static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
-                                                int64_t start,
-                                                int64_t end,
-                                                bool is_write_notifier)
+                                                int64_t start, int64_t end)
 {
     int ret;
     int nr_clusters;
-    BlockBackend *blk = job->common.blk;
     int nbytes;
-    int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
     int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
     assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
@@ -175,8 +127,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
     nbytes = MIN(job->copy_range_size, end - start);
     nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
     hbitmap_reset(job->copy_bitmap, start, job->cluster_size * nr_clusters);
-    ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-                            read_flags, write_flags);
+    ret = bdrv_co_copy_range(job->source, start, job->target, start, nbytes,
+                             0, write_flags);
     if (ret < 0) {
         trace_backup_do_cow_copy_range_fail(job, start, ret);
         hbitmap_set(job->copy_bitmap, start, job->cluster_size * nr_clusters);
@@ -188,24 +140,18 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
 
 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                                       int64_t offset, uint64_t bytes,
-                                      bool *error_is_read,
-                                      bool is_write_notifier)
+                                      bool *error_is_read)
 {
-    CowRequest cow_request;
     int ret = 0;
     int64_t start, end; /* bytes */
     void *bounce_buffer = NULL;
-
-    qemu_co_rwlock_rdlock(&job->flush_rwlock);
+    uint64_t backup_top_progress;
 
     start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
     end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);
 
     trace_backup_do_cow_enter(job, start, offset, bytes);
 
-    wait_for_overlapping_requests(job, start, end);
-    cow_request_begin(&cow_request, job, start, end);
-
     while (start < end) {
         if (!hbitmap_get(job->copy_bitmap, start)) {
             trace_backup_do_cow_skip(job, start);
@@ -216,13 +162,13 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         trace_backup_do_cow_process(job, start);
 
         if (job->use_copy_range) {
-            ret = backup_cow_with_offload(job, start, end, is_write_notifier);
+            ret = backup_cow_with_offload(job, start, end);
             if (ret < 0) {
                 job->use_copy_range = false;
             }
         }
         if (!job->use_copy_range) {
-            ret = backup_cow_with_bounce_buffer(job, start, end, is_write_notifier,
+            ret = backup_cow_with_bounce_buffer(job, start, end,
                                                 error_is_read, &bounce_buffer);
         }
         if (ret < 0) {
@@ -234,7 +180,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
          */
         start += ret;
         job->bytes_read += ret;
-        job_progress_update(&job->common.job, ret);
+        backup_top_progress = bdrv_backup_top_progress(job->backup_top);
+        job_progress_update(&job->common.job, ret + backup_top_progress -
+                            job->backup_top_progress);
+        job->backup_top_progress = backup_top_progress;
         ret = 0;
     }
 
@@ -242,33 +191,15 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
         qemu_vfree(bounce_buffer);
     }
 
-    cow_request_end(&cow_request);
-
     trace_backup_do_cow_return(job, offset, bytes, ret);
 
-    qemu_co_rwlock_unlock(&job->flush_rwlock);
-
     return ret;
 }
 
-static int coroutine_fn backup_before_write_notify(
-        NotifierWithReturn *notifier,
-        void *opaque)
-{
-    BackupBlockJob *job = container_of(notifier, BackupBlockJob, before_write);
-    BdrvTrackedRequest *req = opaque;
-
-    assert(req->bs == blk_bs(job->common.blk));
-    assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
-    assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
-
-    return backup_do_cow(job, req->offset, req->bytes, NULL, true);
-}
-
 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
 {
     BdrvDirtyBitmap *bm;
-    BlockDriverState *bs = blk_bs(job->common.blk);
+    BlockDriverState *bs = job->source->bs;
 
     if (ret < 0) {
         /* Merge the successor back into the parent, delete nothing. */
@@ -300,21 +231,23 @@ static void backup_abort(Job *job)
 static void backup_clean(Job *job)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
-    assert(s->target);
-    blk_unref(s->target);
+
+    /* We must clean it to not crash in backup_drain. */
     s->target = NULL;
 
     if (s->copy_bitmap) {
         hbitmap_free(s->copy_bitmap);
         s->copy_bitmap = NULL;
     }
+
+    bdrv_backup_top_drop(s->backup_top);
 }
 
 static void backup_attached_aio_context(BlockJob *job, AioContext *aio_context)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common);
 
-    blk_set_aio_context(s->target, aio_context);
+    bdrv_set_aio_context(s->target->bs, aio_context);
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -340,10 +273,10 @@ static void backup_drain(BlockJob *job)
      * of backup_complete...
      */
     if (s->target) {
-        BlockBackend *target = s->target;
-        blk_ref(target);
-        blk_drain(target);
-        blk_unref(target);
+        BlockDriverState *target = s->target->bs;
+        bdrv_ref(target);
+        bdrv_drain(target);
+        bdrv_unref(target);
     }
 }
 
@@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
     bool error_is_read;
     int64_t offset;
     HBitmapIter hbi;
+    void *lock = NULL;
 
     hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
-    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
+    while (hbitmap_count(job->copy_bitmap)) {
+        offset = hbitmap_iter_next(&hbi);
+        if (offset == -1) {
+            /*
+             * we may have skipped some clusters, which were handled by
+             * backup-top, but failed and finished by returning error to
+             * the guest and set dirty bit back.
+             */
+            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
+            offset = hbitmap_iter_next(&hbi);
+            assert(offset);
+        }
+
+        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
+        /*
+         * Dirty bit is set, which means that there are no in-flight
+         * write requests on this area. We must succeed.
+         */
+        assert(lock);
+
         do {
             if (yield_and_check(job)) {
+                bdrv_co_unlock(lock);
                 return 0;
             }
-            ret = backup_do_cow(job, offset,
-                                job->cluster_size, &error_is_read, false);
+            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
             if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
                            BLOCK_ERROR_ACTION_REPORT)
             {
+                bdrv_co_unlock(lock);
                 return ret;
             }
         } while (ret < 0);
+
+        bdrv_co_unlock(lock);
+        lock = NULL;
     }
 
     return 0;
@@ -432,12 +389,10 @@ static void backup_incremental_init_copy_bitmap(BackupBlockJob *job)
 static int coroutine_fn backup_run(Job *job, Error **errp)
 {
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
-    BlockDriverState *bs = blk_bs(s->common.blk);
+    BlockDriverState *bs = s->source->bs;
     int64_t offset;
     int ret = 0;
-
-    QLIST_INIT(&s->inflight_reqs);
-    qemu_co_rwlock_init(&s->flush_rwlock);
+    uint64_t backup_top_progress;
 
     job_progress_set_remaining(job, s->len);
 
@@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
         hbitmap_set(s->copy_bitmap, 0, s->len);
     }
 
-    s->before_write.notify = backup_before_write_notify;
-    bdrv_add_before_write_notifier(bs, &s->before_write);
-
     if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
         /* All bits are set in copy_bitmap to allow any cluster to be copied.
          * This does not actually require them to be copied. */
         while (!job_is_cancelled(job)) {
-            /* Yield until the job is cancelled.  We just let our before_write
-             * notify callback service CoW requests. */
+            /*
+             * Yield until the job is cancelled.  We just let our backup-top
+             * fileter driver service CbW requests.
+             */
             job_yield(job);
         }
     } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
         ret = backup_run_incremental(s);
     } else {
+        bool retry;
+        void *lock;
+
+iteration:
+        retry = false;
+        lock = NULL;
+
         /* Both FULL and TOP SYNC_MODE's require copying.. */
         for (offset = 0; offset < s->len;
              offset += s->cluster_size) {
             bool error_is_read;
             int alloced = 0;
 
+            if (retry) {
+                retry = false;
+            } else if (lock) {
+                bdrv_co_unlock(lock);
+                lock = NULL;
+            }
+
             if (yield_and_check(s)) {
                 break;
             }
@@ -498,6 +466,7 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
                 /* If the above loop never found any sectors that are in
                  * the topmost image, skip this backup. */
                 if (alloced == 0) {
+                    hbitmap_reset(s->copy_bitmap, offset, s->cluster_size);
                     continue;
                 }
             }
@@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
             if (alloced < 0) {
                 ret = alloced;
             } else {
+                if (!hbitmap_get(s->copy_bitmap, offset)) {
+                    trace_backup_do_cow_skip(job, offset);
+                    continue; /* already copied */
+                }
+                if (!lock) {
+                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
+                    /*
+                     * Dirty bit is set, which means that there are no in-flight
+                     * write requests on this area. We must succeed.
+                     */
+                    assert(lock);
+                }
                 ret = backup_do_cow(s, offset, s->cluster_size,
-                                    &error_is_read, false);
+                                    &error_is_read);
             }
             if (ret < 0) {
                 /* Depending on error action, fail now or retry cluster */
@@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
                     break;
                 } else {
                     offset -= s->cluster_size;
+                    retry = true;
                     continue;
                 }
             }
         }
+        if (lock) {
+            bdrv_co_unlock(lock);
+            lock = NULL;
+        }
+        if (ret == 0 && !job_is_cancelled(job) &&
+            hbitmap_count(s->copy_bitmap))
+        {
+            /*
+             * we may have skipped some clusters, which were handled by
+             * backup-top, but failed and finished by returning error to
+             * the guest and set dirty bit back.
+             */
+            goto iteration;
+        }
     }
 
-    notifier_with_return_remove(&s->before_write);
+    /* wait pending CBW operations in backup-top */
+    bdrv_drain(s->backup_top);
 
-    /* wait until pending backup_do_cow() calls have completed */
-    qemu_co_rwlock_wrlock(&s->flush_rwlock);
-    qemu_co_rwlock_unlock(&s->flush_rwlock);
+    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
+    job_progress_update(job, ret + backup_top_progress -
+                        s->backup_top_progress);
+    s->backup_top_progress = backup_top_progress;
 
     return ret;
 }
@@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     int ret;
     int64_t cluster_size;
     HBitmap *copy_bitmap = NULL;
+    BlockDriverState *backup_top = NULL;
+    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
+                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;
 
     assert(bs);
     assert(target);
@@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
 
     copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
 
-    /* job->len is fixed, so we can't allow resize */
-    job = block_job_create(job_id, &backup_job_driver, txn, bs,
-                           BLK_PERM_CONSISTENT_READ,
-                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
-                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
-                           speed, creation_flags, cb, opaque, errp);
-    if (!job) {
+    /*
+     * bdrv_get_device_name will not help to find device name starting from
+     * @bs after backup-top append, so let's calculate job_id before. Do
+     * it in the same way like block_job_create
+     */
+    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
+        job_id = bdrv_get_device_name(bs);
+    }
+
+    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
+    if (!backup_top) {
         goto error;
     }
 
-    /* The target must match the source in size, so no resize here either */
-    job->target = blk_new(BLK_PERM_WRITE,
-                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
-                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
-    ret = blk_insert_bs(job->target, target, errp);
-    if (ret < 0) {
+    /* job->len is fixed, so we can't allow resize */
+    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,
+                           all_except_resize, speed, creation_flags,
+                           cb, opaque, errp);
+    if (!job) {
         goto error;
     }
 
+    job->source = backup_top->backing;
+    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;
+
     job->on_source_error = on_source_error;
     job->on_target_error = on_target_error;
     job->sync_mode = sync_mode;
@@ -687,16 +694,19 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->copy_bitmap = copy_bitmap;
     copy_bitmap = NULL;
     job->use_copy_range = true;
-    job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
-                                        blk_get_max_transfer(job->target));
+    job->copy_range_size =
+            MIN_NON_ZERO(MIN_NON_ZERO(INT_MAX,
+                                      job->source->bs->bl.max_transfer),
+                         job->target->bs->bl.max_transfer);
     job->copy_range_size = MAX(job->cluster_size,
                                QEMU_ALIGN_UP(job->copy_range_size,
                                              job->cluster_size));
 
-    /* Required permissions are already taken with target's blk_new() */
-    block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
+    /* The target must match the source in size, so no resize here either */
+    block_job_add_bdrv(&job->common, "target", target, 0, all_except_resize,
                        &error_abort);
     job->len = len;
+    job->backup_top = backup_top;
 
     return &job->common;
 
@@ -712,6 +722,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
         backup_clean(&job->common.job);
         job_early_fail(&job->common.job);
     }
+    if (backup_top) {
+        bdrv_backup_top_drop(backup_top);
+    }
 
     return NULL;
 }
-- 
2.18.0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap Vladimir Sementsov-Ogievskiy
@ 2019-01-14 13:10   ` Max Reitz
  2019-01-14 13:13     ` Max Reitz
  2019-01-14 14:01     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-14 13:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Simplify backup_incremental_init_copy_bitmap using the function
> bdrv_dirty_bitmap_next_dirty_area.
> 
> Note: move to job->len instead of bitmap size: it should not matter but
> less code.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 40 ++++++++++++----------------------------
>  1 file changed, 12 insertions(+), 28 deletions(-)

Overall: What is this function even supposed to do?  To me, it looks
like it marks all areas in job->copy_bitmap dirty that are dirty in
job->sync_bitmap.

If so, wouldn't just replacing this by hbitmap_merge() simplify things
further?

> diff --git a/block/backup.c b/block/backup.c
> index 435414e964..fbe7ce19e1 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)

[...]

> +    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
> +                                             &offset, &bytes))
> +    {
> +        uint64_t cluster = offset / job->cluster_size;
> +        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
>  
> -        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
> -        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
> -        if (next_cluster >= end) {
> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);

Why the +1?  Shouldn't the division for last_cluster round up instead?

> +
> +        offset = (last_cluster + 1) * job->cluster_size;

Same here.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-14 13:10   ` Max Reitz
@ 2019-01-14 13:13     ` Max Reitz
  2019-01-14 14:01     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-14 13:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 839 bytes --]

On 14.01.19 14:10, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Simplify backup_incremental_init_copy_bitmap using the function
>> bdrv_dirty_bitmap_next_dirty_area.
>>
>> Note: move to job->len instead of bitmap size: it should not matter but
>> less code.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  block/backup.c | 40 ++++++++++++----------------------------
>>  1 file changed, 12 insertions(+), 28 deletions(-)
> 
> Overall: What is this function even supposed to do?  To me, it looks
> like it marks all areas in job->copy_bitmap dirty that are dirty in
> job->sync_bitmap.
> 
> If so, wouldn't just replacing this by hbitmap_merge() simplify things
> further?

Ah, no, because they would need to have the same granularity...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-14 13:10   ` Max Reitz
  2019-01-14 13:13     ` Max Reitz
@ 2019-01-14 14:01     ` Vladimir Sementsov-Ogievskiy
  2019-01-14 14:13       ` Max Reitz
  1 sibling, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-14 14:01 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

14.01.2019 16:10, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Simplify backup_incremental_init_copy_bitmap using the function
>> bdrv_dirty_bitmap_next_dirty_area.
>>
>> Note: move to job->len instead of bitmap size: it should not matter but
>> less code.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/backup.c | 40 ++++++++++++----------------------------
>>   1 file changed, 12 insertions(+), 28 deletions(-)
> 
> Overall: What is this function even supposed to do?  To me, it looks
> like it marks all areas in job->copy_bitmap dirty that are dirty in
> job->sync_bitmap.
> 
> If so, wouldn't just replacing this by hbitmap_merge() simplify things
> further?
> 
>> diff --git a/block/backup.c b/block/backup.c
>> index 435414e964..fbe7ce19e1 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
> 
> [...]
> 
>> +    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
>> +                                             &offset, &bytes))
>> +    {
>> +        uint64_t cluster = offset / job->cluster_size;
>> +        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
>>   
>> -        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
>> -        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
>> -        if (next_cluster >= end) {
>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
> 
> Why the +1?  Shouldn't the division for last_cluster round up instead?
> 
>> +
>> +        offset = (last_cluster + 1) * job->cluster_size;
> 
> Same here.

last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.

If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity Vladimir Sementsov-Ogievskiy
@ 2019-01-14 14:10   ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-14 14:10 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> We are going to share this bitmap between backup and backup-top filter
> driver, so let's share something more meaningful. It also simplifies
> some calculations.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 48 +++++++++++++++++++++++-------------------------
>  1 file changed, 23 insertions(+), 25 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-14 14:01     ` Vladimir Sementsov-Ogievskiy
@ 2019-01-14 14:13       ` Max Reitz
  2019-01-14 14:48         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-14 14:13 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]

On 14.01.19 15:01, Vladimir Sementsov-Ogievskiy wrote:
> 14.01.2019 16:10, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Simplify backup_incremental_init_copy_bitmap using the function
>>> bdrv_dirty_bitmap_next_dirty_area.
>>>
>>> Note: move to job->len instead of bitmap size: it should not matter but
>>> less code.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/backup.c | 40 ++++++++++++----------------------------
>>>   1 file changed, 12 insertions(+), 28 deletions(-)
>>
>> Overall: What is this function even supposed to do?  To me, it looks
>> like it marks all areas in job->copy_bitmap dirty that are dirty in
>> job->sync_bitmap.
>>
>> If so, wouldn't just replacing this by hbitmap_merge() simplify things
>> further?
>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 435414e964..fbe7ce19e1 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>
>> [...]
>>
>>> +    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
>>> +                                             &offset, &bytes))
>>> +    {
>>> +        uint64_t cluster = offset / job->cluster_size;
>>> +        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
>>>   
>>> -        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
>>> -        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
>>> -        if (next_cluster >= end) {
>>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
>>
>> Why the +1?  Shouldn't the division for last_cluster round up instead?
>>
>>> +
>>> +        offset = (last_cluster + 1) * job->cluster_size;
>>
>> Same here.
> 
> last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.
> 
> If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?

Doesn't bdrv_dirty_bitmap_next_dirty_area() return a range [offset,
offset + bytes), i.e. where "offset + bytes" is the first clean offset?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child Vladimir Sementsov-Ogievskiy
@ 2019-01-14 14:32   ` Max Reitz
  2019-01-14 16:13     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-14 14:32 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 2917 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> As it already said in the comment, we don't want to create loops in
> parent->child relations. So, when we try to append @to to @c, we should
> check that @c is not in @to children subtree, and we should check it
> recursively, not only the first level. The patch provides BFS-based
> search, to check the relations.
> 
> This is needed for further fleecing-hook filter usage: we need to
> append it to source, when the hook is already a parent of target, and
> source may be in a backing chain of target (fleecing-scheme). So, on
> appending, the hook should not became a child (direct or through
> children subtree) of the target.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block.c | 33 ++++++++++++++++++++++++++++-----
>  1 file changed, 28 insertions(+), 5 deletions(-)

Just two technical details:

> diff --git a/block.c b/block.c
> index 4f5ff2cc12..8f04f293da 100644
> --- a/block.c
> +++ b/block.c
> @@ -3536,7 +3536,7 @@ void bdrv_close_all(void)
>  
>  static bool should_update_child(BdrvChild *c, BlockDriverState *to)
>  {
> -    BdrvChild *to_c;
> +    GList *queue = NULL, *pos;

GSList should be sufficient, I think.

>  
>      if (c->role->stay_at_node) {
>          return false;
> @@ -3572,13 +3572,36 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
>       * if A is a child of B, that means we cannot replace A by B there
>       * because that would create a loop.  Silently detaching A from B
>       * is also not really an option.  So overall just leaving A in
> -     * place there is the most sensible choice. */
> -    QLIST_FOREACH(to_c, &to->children, next) {
> -        if (to_c == c) {
> -            return false;
> +     * place there is the most sensible choice.
> +     *
> +     * We would also create a loop in any cases where @c is only
> +     * indirectly referenced by @to. Prevent this by returning false
> +     * if @c is found (by breadth-first search) anywhere in the whole
> +     * subtree of @to.
> +     */
> +
> +    pos = queue = g_list_append(queue, to);
> +    while (pos) {
> +        BlockDriverState *v = pos->data;
> +        BdrvChild *c2;
> +
> +        QLIST_FOREACH(c2, &v->children, next) {
> +            if (c2 == c) {
> +                g_list_free(queue);
> +                return false;
> +            }
> +
> +            if (g_list_find(queue, c2->bs)) {
> +                continue;
> +            }
> +
> +            queue = g_list_append(queue, c2->bs);

Appending to pos should be a bit quicker, I presume.  (And you can
probably ignore the result, or just assert that the first argument was
returned.)

Max

>          }
> +
> +        pos = pos->next;
>      }
>  
> +    g_list_free(queue);
>      return true;
>  }
>  
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent Vladimir Sementsov-Ogievskiy
@ 2019-01-14 14:36   ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-14 14:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Correctly rewrite default argument. After the patch, the function can
> be used for other (not only default test-chosen) image format.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/iotests.py | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name Vladimir Sementsov-Ogievskiy
@ 2019-01-14 14:46   ` Max Reitz
  2019-01-14 16:06     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-14 14:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 2556 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> After node graph changes, we may not be able to resume_drive by device
> name (backing files are not recursively searched). So, lets allow to
> resume by node-name. Set constant name for breakpoints, to avoid
> introducing extra parameters.

Hm, I don't quite understand this reason.  Is this so you can create
breakpoints on one node (which falls through to the first blkdebug node)
and then remove them from another (falling through to the same blkdebug
node)?

Wouldn't it be better to let the user specify the breakpoint name?

Max

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/iotests.py | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 76877ad584..c9779f432f 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -415,11 +415,11 @@ class VM(qtest.QEMUQtestMachine):
>              self.pause_drive(drive, "write_aio")
>              return
>          self.qmp('human-monitor-command',
> -                    command_line='qemu-io %s "break %s bp_%s"' % (drive, event, drive))
> +                    command_line='qemu-io %s "break %s bp_0"' % (drive, event))
>  
>      def resume_drive(self, drive):
>          self.qmp('human-monitor-command',
> -                    command_line='qemu-io %s "remove_break bp_%s"' % (drive, drive))
> +                    command_line='qemu-io %s "remove_break bp_0"' % (drive))
>  
>      def hmp_qemu_io(self, drive, cmd):
>          '''Write to a given drive using an HMP command'''
> @@ -543,13 +543,14 @@ class QMPTestCase(unittest.TestCase):
>          self.assertEqual(self.vm.flatten_qmp_object(json.loads(json_filename[5:])),
>                           self.vm.flatten_qmp_object(reference))
>  
> -    def cancel_and_wait(self, drive='drive0', force=False, resume=False):
> +    def cancel_and_wait(self, drive='drive0', force=False, resume=False,
> +                        resume_node=None):
>          '''Cancel a block job and wait for it to finish, returning the event'''
>          result = self.vm.qmp('block-job-cancel', device=drive, force=force)
>          self.assert_qmp(result, 'return', {})
>  
>          if resume:
> -            self.vm.resume_drive(drive)
> +            self.vm.resume_drive(resume_node or drive)
>  
>          cancelled = False
>          result = None
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-14 14:13       ` Max Reitz
@ 2019-01-14 14:48         ` Vladimir Sementsov-Ogievskiy
  2019-01-16 13:05           ` Max Reitz
  0 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-14 14:48 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

14.01.2019 17:13, Max Reitz wrote:
> On 14.01.19 15:01, Vladimir Sementsov-Ogievskiy wrote:
>> 14.01.2019 16:10, Max Reitz wrote:
>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>> Simplify backup_incremental_init_copy_bitmap using the function
>>>> bdrv_dirty_bitmap_next_dirty_area.
>>>>
>>>> Note: move to job->len instead of bitmap size: it should not matter but
>>>> less code.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/backup.c | 40 ++++++++++++----------------------------
>>>>    1 file changed, 12 insertions(+), 28 deletions(-)
>>>
>>> Overall: What is this function even supposed to do?  To me, it looks
>>> like it marks all areas in job->copy_bitmap dirty that are dirty in
>>> job->sync_bitmap.
>>>
>>> If so, wouldn't just replacing this by hbitmap_merge() simplify things
>>> further?
>>>
>>>> diff --git a/block/backup.c b/block/backup.c
>>>> index 435414e964..fbe7ce19e1 100644
>>>> --- a/block/backup.c
>>>> +++ b/block/backup.c
>>>> @@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>>
>>> [...]
>>>
>>>> +    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
>>>> +                                             &offset, &bytes))
>>>> +    {
>>>> +        uint64_t cluster = offset / job->cluster_size;
>>>> +        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
>>>>    
>>>> -        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
>>>> -        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
>>>> -        if (next_cluster >= end) {
>>>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
>>>
>>> Why the +1?  Shouldn't the division for last_cluster round up instead?
>>>
>>>> +
>>>> +        offset = (last_cluster + 1) * job->cluster_size;
>>>
>>> Same here.
>>
>> last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.
>>
>> If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?
> 
> Doesn't bdrv_dirty_bitmap_next_dirty_area() return a range [offset,
> offset + bytes), i.e. where "offset + bytes" is the first clean offset?

oops, you are right. then I need
uint64_t last_cluster = (offset + bytes - 1) / job->cluster_size;

> 
> Max
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name
  2019-01-14 14:46   ` Max Reitz
@ 2019-01-14 16:06     ` Vladimir Sementsov-Ogievskiy
  2019-01-16 13:11       ` Max Reitz
  0 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-14 16:06 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

14.01.2019 17:46, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> After node graph changes, we may not be able to resume_drive by device
>> name (backing files are not recursively searched). So, lets allow to
>> resume by node-name. Set constant name for breakpoints, to avoid
>> introducing extra parameters.
> 
> Hm, I don't quite understand this reason.  Is this so you can create
> breakpoints on one node (which falls through to the first blkdebug node)
> and then remove them from another (falling through to the same blkdebug
> node)?

add/remove breakpoint goes through ->file children, but my filter links
active disk as ->backing. So, before block-job start we can insert breakpoint
by device name. But then, when filter inserted, we can't remove breakpoint,
because my filter hides blkdebug with active disk under ->backing link.

Maybe, right solution would be support backing links in bdrv_debug_breakpoint()
and bdrv_debug_remove_breakpoint() for the case when there is no file child.

But being unsure about right behavior, I've decided to adjust the test.

What about just do both  add/remove breakpoint through blkdebug node-name, to
make it less weird?

> 
> Wouldn't it be better to let the user specify the breakpoint name?

it's not needed now. with current naming we can have one break-point in device,
so we don't need different names.

> 
> Max
> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   tests/qemu-iotests/iotests.py | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>> index 76877ad584..c9779f432f 100644
>> --- a/tests/qemu-iotests/iotests.py
>> +++ b/tests/qemu-iotests/iotests.py
>> @@ -415,11 +415,11 @@ class VM(qtest.QEMUQtestMachine):
>>               self.pause_drive(drive, "write_aio")
>>               return
>>           self.qmp('human-monitor-command',
>> -                    command_line='qemu-io %s "break %s bp_%s"' % (drive, event, drive))
>> +                    command_line='qemu-io %s "break %s bp_0"' % (drive, event))
>>   
>>       def resume_drive(self, drive):
>>           self.qmp('human-monitor-command',
>> -                    command_line='qemu-io %s "remove_break bp_%s"' % (drive, drive))
>> +                    command_line='qemu-io %s "remove_break bp_0"' % (drive))
>>   
>>       def hmp_qemu_io(self, drive, cmd):
>>           '''Write to a given drive using an HMP command'''
>> @@ -543,13 +543,14 @@ class QMPTestCase(unittest.TestCase):
>>           self.assertEqual(self.vm.flatten_qmp_object(json.loads(json_filename[5:])),
>>                            self.vm.flatten_qmp_object(reference))
>>   
>> -    def cancel_and_wait(self, drive='drive0', force=False, resume=False):
>> +    def cancel_and_wait(self, drive='drive0', force=False, resume=False,
>> +                        resume_node=None):
>>           '''Cancel a block job and wait for it to finish, returning the event'''
>>           result = self.vm.qmp('block-job-cancel', device=drive, force=force)
>>           self.assert_qmp(result, 'return', {})
>>   
>>           if resume:
>> -            self.vm.resume_drive(drive)
>> +            self.vm.resume_drive(resume_node or drive)
>>   
>>           cancelled = False
>>           result = None
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child
  2019-01-14 14:32   ` Max Reitz
@ 2019-01-14 16:13     ` Vladimir Sementsov-Ogievskiy
  2019-01-16 13:17       ` Max Reitz
  0 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-14 16:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

14.01.2019 17:32, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> As it already said in the comment, we don't want to create loops in
>> parent->child relations. So, when we try to append @to to @c, we should
>> check that @c is not in @to children subtree, and we should check it
>> recursively, not only the first level. The patch provides BFS-based
>> search, to check the relations.
>>
>> This is needed for further fleecing-hook filter usage: we need to
>> append it to source, when the hook is already a parent of target, and
>> source may be in a backing chain of target (fleecing-scheme). So, on
>> appending, the hook should not became a child (direct or through
>> children subtree) of the target.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block.c | 33 ++++++++++++++++++++++++++++-----
>>   1 file changed, 28 insertions(+), 5 deletions(-)
> 
> Just two technical details:
> 
>> diff --git a/block.c b/block.c
>> index 4f5ff2cc12..8f04f293da 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3536,7 +3536,7 @@ void bdrv_close_all(void)
>>   
>>   static bool should_update_child(BdrvChild *c, BlockDriverState *to)
>>   {
>> -    BdrvChild *to_c;
>> +    GList *queue = NULL, *pos;
> 
> GSList should be sufficient, I think.
> 
>>   
>>       if (c->role->stay_at_node) {
>>           return false;
>> @@ -3572,13 +3572,36 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
>>        * if A is a child of B, that means we cannot replace A by B there
>>        * because that would create a loop.  Silently detaching A from B
>>        * is also not really an option.  So overall just leaving A in
>> -     * place there is the most sensible choice. */
>> -    QLIST_FOREACH(to_c, &to->children, next) {
>> -        if (to_c == c) {
>> -            return false;
>> +     * place there is the most sensible choice.
>> +     *
>> +     * We would also create a loop in any cases where @c is only
>> +     * indirectly referenced by @to. Prevent this by returning false
>> +     * if @c is found (by breadth-first search) anywhere in the whole
>> +     * subtree of @to.
>> +     */
>> +
>> +    pos = queue = g_list_append(queue, to);
>> +    while (pos) {
>> +        BlockDriverState *v = pos->data;
>> +        BdrvChild *c2;
>> +
>> +        QLIST_FOREACH(c2, &v->children, next) {
>> +            if (c2 == c) {
>> +                g_list_free(queue);
>> +                return false;
>> +            }
>> +
>> +            if (g_list_find(queue, c2->bs)) {
>> +                continue;
>> +            }
>> +
>> +            queue = g_list_append(queue, c2->bs);
> 
> Appending to pos should be a bit quicker, I presume.  (And you can
> probably ignore the result, or just assert that the first argument was
> returned.)

So, even better is to use GQueue, strange that I didn't.

> 
> Max
> 
>>           }
>> +
>> +        pos = pos->next;
>>       }
>>   
>> +    g_list_free(queue);
>>       return true;
>>   }
>>   
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-14 14:48         ` Vladimir Sementsov-Ogievskiy
@ 2019-01-16 13:05           ` Max Reitz
  2019-01-23  8:20             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-16 13:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 2807 bytes --]

On 14.01.19 15:48, Vladimir Sementsov-Ogievskiy wrote:
> 14.01.2019 17:13, Max Reitz wrote:
>> On 14.01.19 15:01, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.01.2019 16:10, Max Reitz wrote:
>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Simplify backup_incremental_init_copy_bitmap using the function
>>>>> bdrv_dirty_bitmap_next_dirty_area.
>>>>>
>>>>> Note: move to job->len instead of bitmap size: it should not matter but
>>>>> less code.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>    block/backup.c | 40 ++++++++++++----------------------------
>>>>>    1 file changed, 12 insertions(+), 28 deletions(-)
>>>>
>>>> Overall: What is this function even supposed to do?  To me, it looks
>>>> like it marks all areas in job->copy_bitmap dirty that are dirty in
>>>> job->sync_bitmap.
>>>>
>>>> If so, wouldn't just replacing this by hbitmap_merge() simplify things
>>>> further?
>>>>
>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>> index 435414e964..fbe7ce19e1 100644
>>>>> --- a/block/backup.c
>>>>> +++ b/block/backup.c
>>>>> @@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>>>
>>>> [...]
>>>>
>>>>> +    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
>>>>> +                                             &offset, &bytes))
>>>>> +    {
>>>>> +        uint64_t cluster = offset / job->cluster_size;
>>>>> +        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
>>>>>    
>>>>> -        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
>>>>> -        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
>>>>> -        if (next_cluster >= end) {
>>>>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
>>>>
>>>> Why the +1?  Shouldn't the division for last_cluster round up instead?
>>>>
>>>>> +
>>>>> +        offset = (last_cluster + 1) * job->cluster_size;
>>>>
>>>> Same here.
>>>
>>> last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.
>>>
>>> If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?
>>
>> Doesn't bdrv_dirty_bitmap_next_dirty_area() return a range [offset,
>> offset + bytes), i.e. where "offset + bytes" is the first clean offset?
> 
> oops, you are right. then I need
> uint64_t last_cluster = (offset + bytes - 1) / job->cluster_size;

That, or you just use a rounding up division and rename it from
last_cluster to end_cluster or first_clean_cluster or something (and
subsequently drop the +1s).

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name
  2019-01-14 16:06     ` Vladimir Sementsov-Ogievskiy
@ 2019-01-16 13:11       ` Max Reitz
  2019-01-23 13:22         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-16 13:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 3804 bytes --]

On 14.01.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
> 14.01.2019 17:46, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> After node graph changes, we may not be able to resume_drive by device
>>> name (backing files are not recursively searched). So, lets allow to
>>> resume by node-name. Set constant name for breakpoints, to avoid
>>> introducing extra parameters.
>>
>> Hm, I don't quite understand this reason.  Is this so you can create
>> breakpoints on one node (which falls through to the first blkdebug node)
>> and then remove them from another (falling through to the same blkdebug
>> node)?
> 
> add/remove breakpoint goes through ->file children, but my filter links
> active disk as ->backing. So, before block-job start we can insert breakpoint
> by device name. But then, when filter inserted, we can't remove breakpoint,
> because my filter hides blkdebug with active disk under ->backing link.
> 
> Maybe, right solution would be support backing links in bdrv_debug_breakpoint()
> and bdrv_debug_remove_breakpoint() for the case when there is no file child.
> 
> But being unsure about right behavior, I've decided to adjust the test.
> 
> What about just do both  add/remove breakpoint through blkdebug node-name, to
> make it less weird?

Yes, I was mostly wondering about the "set constant name for
breakpoints".  It doesn't seem necessary to me; if you use node names,
you could address the blkdebug node directly, I would think.

Max

>> Wouldn't it be better to let the user specify the breakpoint name?
> 
> it's not needed now. with current naming we can have one break-point in device,
> so we don't need different names.
> 
>>
>> Max
>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   tests/qemu-iotests/iotests.py | 9 +++++----
>>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>>> index 76877ad584..c9779f432f 100644
>>> --- a/tests/qemu-iotests/iotests.py
>>> +++ b/tests/qemu-iotests/iotests.py
>>> @@ -415,11 +415,11 @@ class VM(qtest.QEMUQtestMachine):
>>>               self.pause_drive(drive, "write_aio")
>>>               return
>>>           self.qmp('human-monitor-command',
>>> -                    command_line='qemu-io %s "break %s bp_%s"' % (drive, event, drive))
>>> +                    command_line='qemu-io %s "break %s bp_0"' % (drive, event))
>>>   
>>>       def resume_drive(self, drive):
>>>           self.qmp('human-monitor-command',
>>> -                    command_line='qemu-io %s "remove_break bp_%s"' % (drive, drive))
>>> +                    command_line='qemu-io %s "remove_break bp_0"' % (drive))
>>>   
>>>       def hmp_qemu_io(self, drive, cmd):
>>>           '''Write to a given drive using an HMP command'''
>>> @@ -543,13 +543,14 @@ class QMPTestCase(unittest.TestCase):
>>>           self.assertEqual(self.vm.flatten_qmp_object(json.loads(json_filename[5:])),
>>>                            self.vm.flatten_qmp_object(reference))
>>>   
>>> -    def cancel_and_wait(self, drive='drive0', force=False, resume=False):
>>> +    def cancel_and_wait(self, drive='drive0', force=False, resume=False,
>>> +                        resume_node=None):
>>>           '''Cancel a block job and wait for it to finish, returning the event'''
>>>           result = self.vm.qmp('block-job-cancel', device=drive, force=force)
>>>           self.assert_qmp(result, 'return', {})
>>>   
>>>           if resume:
>>> -            self.vm.resume_drive(drive)
>>> +            self.vm.resume_drive(resume_node or drive)
>>>   
>>>           cancelled = False
>>>           result = None
>>>
>>
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child
  2019-01-14 16:13     ` Vladimir Sementsov-Ogievskiy
@ 2019-01-16 13:17       ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-16 13:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 3181 bytes --]

On 14.01.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
> 14.01.2019 17:32, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> As it already said in the comment, we don't want to create loops in
>>> parent->child relations. So, when we try to append @to to @c, we should
>>> check that @c is not in @to children subtree, and we should check it
>>> recursively, not only the first level. The patch provides BFS-based
>>> search, to check the relations.
>>>
>>> This is needed for further fleecing-hook filter usage: we need to
>>> append it to source, when the hook is already a parent of target, and
>>> source may be in a backing chain of target (fleecing-scheme). So, on
>>> appending, the hook should not became a child (direct or through
>>> children subtree) of the target.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block.c | 33 ++++++++++++++++++++++++++++-----
>>>   1 file changed, 28 insertions(+), 5 deletions(-)
>>
>> Just two technical details:
>>
>>> diff --git a/block.c b/block.c
>>> index 4f5ff2cc12..8f04f293da 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -3536,7 +3536,7 @@ void bdrv_close_all(void)
>>>   
>>>   static bool should_update_child(BdrvChild *c, BlockDriverState *to)
>>>   {
>>> -    BdrvChild *to_c;
>>> +    GList *queue = NULL, *pos;
>>
>> GSList should be sufficient, I think.
>>
>>>   
>>>       if (c->role->stay_at_node) {
>>>           return false;
>>> @@ -3572,13 +3572,36 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
>>>        * if A is a child of B, that means we cannot replace A by B there
>>>        * because that would create a loop.  Silently detaching A from B
>>>        * is also not really an option.  So overall just leaving A in
>>> -     * place there is the most sensible choice. */
>>> -    QLIST_FOREACH(to_c, &to->children, next) {
>>> -        if (to_c == c) {
>>> -            return false;
>>> +     * place there is the most sensible choice.
>>> +     *
>>> +     * We would also create a loop in any cases where @c is only
>>> +     * indirectly referenced by @to. Prevent this by returning false
>>> +     * if @c is found (by breadth-first search) anywhere in the whole
>>> +     * subtree of @to.
>>> +     */
>>> +
>>> +    pos = queue = g_list_append(queue, to);
>>> +    while (pos) {
>>> +        BlockDriverState *v = pos->data;
>>> +        BdrvChild *c2;
>>> +
>>> +        QLIST_FOREACH(c2, &v->children, next) {
>>> +            if (c2 == c) {
>>> +                g_list_free(queue);
>>> +                return false;
>>> +            }
>>> +
>>> +            if (g_list_find(queue, c2->bs)) {
>>> +                continue;
>>> +            }
>>> +
>>> +            queue = g_list_append(queue, c2->bs);
>>
>> Appending to pos should be a bit quicker, I presume.  (And you can
>> probably ignore the result, or just assert that the first argument was
>> returned.)
> 
> So, even better is to use GQueue, strange that I didn't.

Because it has a tail pointer built in?  Hu, yeah, that's even better. :-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job Vladimir Sementsov-Ogievskiy
@ 2019-01-16 13:48   ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-16 13:48 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 465 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Backup will append fleecing-hook node above source node, so, we can't
> resume by device name (because resume don't search recursively through
> backing chain).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  tests/qemu-iotests/055 | 23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver Vladimir Sementsov-Ogievskiy
@ 2019-01-16 16:02   ` Max Reitz
  2019-01-17 12:13     ` Vladimir Sementsov-Ogievskiy
  2019-04-13 16:08       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-16 16:02 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 15568 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Backup-top filter does copy-before-write operation. It should be
> inserted above active disk and has a target node for CBW, like the
> following:
> 
>     +-------+
>     | Guest |
>     +---+---+
>         |r,w
>         v
>     +---+-----------+  target   +---------------+
>     | backup_top    |---------->| target(qcow2) |
>     +---+-----------+   CBW     +---+-----------+
>         |
> backing |r,w
>         v
>     +---+---------+
>     | Active disk |
>     +-------------+
> 
> The driver will be used in backup instead of write-notifiers.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup-top.h  |  43 +++++++
>  block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>  block/Makefile.objs |   2 +
>  3 files changed, 351 insertions(+)
>  create mode 100644 block/backup-top.h
>  create mode 100644 block/backup-top.c

Looks good to me overall, comments below:

> diff --git a/block/backup-top.h b/block/backup-top.h
> new file mode 100644
> index 0000000000..abe4dda574
> --- /dev/null
> +++ b/block/backup-top.h
> @@ -0,0 +1,43 @@
> +/*
> + * backup-top filter driver
> + *
> + * The driver performs Copy-Before-Write (CBW) operation: it is injected above
> + * some node, and before each write it copies _old_ data to the target node.
> + *
> + * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +

Include guards are missing here.

> +#include "qemu/osdep.h"
> +
> +#include "block/block_int.h"
> +
> +typedef struct BDRVBackupTopState {
> +    HBitmap *copy_bitmap; /* what should be copied to @target on guest write. */
> +    BdrvChild *target;
> +
> +    uint64_t bytes_copied;
> +} BDRVBackupTopState;
> +
> +void bdrv_backup_top_drop(BlockDriverState *bs);
> +uint64_t bdrv_backup_top_progress(BlockDriverState *bs);
> +
> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
> +                                         BlockDriverState *target,
> +                                         HBitmap *copy_bitmap,
> +                                         Error **errp);

A comment describing at least parts of the interface would be nice; for
instance, that @target will be forced into the same AioContext as
@source, or that the backup-top node will be an implicit node.

> diff --git a/block/backup-top.c b/block/backup-top.c
> new file mode 100644
> index 0000000000..0e7b3e3142
> --- /dev/null
> +++ b/block/backup-top.c
> @@ -0,0 +1,306 @@
> +/*
> + * backup-top filter driver
> + *
> + * The driver performs Copy-Before-Write (CBW) operation: it is injected above
> + * some node, and before each write it copies _old_ data to the target node.
> + *
> + * Copyright (c) 2018 Virtuozzo International GmbH. All rights reserved.
> + *
> + * Author:
> + *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "qemu/cutils.h"
> +#include "qapi/error.h"
> +#include "block/block_int.h"
> +#include "block/qdict.h"
> +
> +#include "block/backup-top.h"
> +
> +static coroutine_fn int backup_top_co_preadv(
> +        BlockDriverState *bs, uint64_t offset, uint64_t bytes,
> +        QEMUIOVector *qiov, int flags)
> +{
> +    /*
> +     * Features to be implemented:
> +     * F1. COR. save read data to fleecing target for fast access
> +     *     (to reduce reads). This possibly may be done with use of copy-on-read
> +     *     filter, but we need an ability to make COR requests optional: for
> +     *     example, if target is a ram-cache, and if it is full now, we should
> +     *     skip doing COR request, as it is actually not necessary.
> +     *
> +     * F2. Feature for guest: read from fleecing target if data is in ram-cache
> +     *     and is unchanged
> +     */
> +
> +    return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
> +}
> +
> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
> +                                          uint64_t bytes)

I don't know if there is a real guideline, but usually parameters are
aligned at the opening parenthesis.

> +{
> +    int ret = 0;
> +    BDRVBackupTopState *s = bs->opaque;
> +    uint64_t gran = 1UL << hbitmap_granularity(s->copy_bitmap);
> +    uint64_t end = QEMU_ALIGN_UP(offset + bytes, gran);
> +    uint64_t off = QEMU_ALIGN_DOWN(offset, gran), len;
> +    size_t align = MAX(bdrv_opt_mem_align(bs->backing->bs),
> +                       bdrv_opt_mem_align(s->target->bs));
> +    struct iovec iov = {
> +        .iov_base = qemu_memalign(align, end - off),

I think it would be more correct to use qemu_blockalign() here and set
bs->bl.opt_mem_alignment somewhere.

> +        .iov_len = end - off

How about just leaving this out here... [1]

> +    };
> +    QEMUIOVector qiov;
> +
> +    qemu_iovec_init_external(&qiov, &iov, 1);

[1] ...and move this... [1]

> +
> +    /*
> +     * Features to be implemented:
> +     * F3. parallelize copying loop
> +     * F4. detect zeros

Isn't there detect-zeroes for that?

> +     * F5. use block_status ?
> +     * F6. don't copy clusters which are already cached by COR [see F1]
> +     * F7. if target is ram-cache and it is full, there should be a possibility
> +     *     to drop not necessary data (cached by COR [see F1]) to handle CBW
> +     *     fast.

Another idea: Read all dirty data from bs->backing (in parallel, more or
less), and then perform all writes (to bs->backing and s->target) in
parallel as well.

(So the writes do not have to wait on one another)

> +     */
> +
> +    len = end - off;
> +    while (hbitmap_next_dirty_area(s->copy_bitmap, &off, &len)) {
> +        iov.iov_len = qiov.size = len;

[1] ...here, and not setting qiov.size?

> +
> +        hbitmap_reset(s->copy_bitmap, off, len);
> +
> +        ret = bdrv_co_preadv(bs->backing, off, len, &qiov,
> +                             BDRV_REQ_NO_SERIALISING);
> +        if (ret < 0) {
> +            hbitmap_set(s->copy_bitmap, off, len);
> +            goto finish;
> +        }
> +
> +        ret = bdrv_co_pwritev(s->target, off, len, &qiov, BDRV_REQ_SERIALISING);
> +        if (ret < 0) {
> +            hbitmap_set(s->copy_bitmap, off, len);
> +            goto finish;
> +        }
> +
> +        s->bytes_copied += len;
> +        off += len;
> +        if (off >= end) {
> +            break;
> +        }
> +        len = end - off;
> +    }
> +
> +finish:
> +    qemu_vfree(iov.iov_base);
> +
> +    /*
> +     * F8. we fail guest request in case of error. We can alter it by
> +     * possibility to fail copying process instead, or retry several times, or
> +     * may be guest pause, etc.
> +     */
> +    return ret;
> +}
> +
> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
> +                                                  int64_t offset, int bytes)

(Indentation)

> +{
> +    int ret = backup_top_cbw(bs, offset, bytes);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    /*
> +     * Features to be implemented:
> +     * F9. possibility of lazy discard: just defer the discard after fleecing
> +     *     completion. If write (or new discard) occurs to the same area, just
> +     *     drop deferred discard.
> +     */
> +
> +    return bdrv_co_pdiscard(bs->backing, offset, bytes);
> +}
> +
> +static int coroutine_fn backup_top_co_pwrite_zeroes(BlockDriverState *bs,
> +        int64_t offset, int bytes, BdrvRequestFlags flags)
> +{
> +    int ret = backup_top_cbw(bs, offset, bytes);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    return bdrv_co_pwrite_zeroes(bs->backing, offset, bytes, flags);
> +}
> +
> +static coroutine_fn int backup_top_co_pwritev(BlockDriverState *bs,
> +                                                 uint64_t offset,
> +                                                 uint64_t bytes,
> +                                                 QEMUIOVector *qiov, int flags)

(Indentation)

> +{
> +    int ret = backup_top_cbw(bs, offset, bytes);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
> +}
> +
> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
> +{
> +    if (!bs->backing) {
> +        return 0;
> +    }
> +
> +    return bdrv_co_flush(bs->backing->bs);

Why not the target as well?

> +}
> +
> +static void backup_top_refresh_filename(BlockDriverState *bs, QDict *opts)
> +{
> +    if (bs->backing == NULL) {
> +        /*
> +         * we can be here after failed bdrv_attach_child in
> +         * bdrv_set_backing_hd
> +         */
> +        return;
> +    }
> +    bdrv_refresh_filename(bs->backing->bs);
> +    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
> +            bs->backing->bs->filename);
> +}
> +
> +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
> +                                       const BdrvChildRole *role,
> +                                       BlockReopenQueue *reopen_queue,
> +                                       uint64_t perm, uint64_t shared,
> +                                       uint64_t *nperm, uint64_t *nshared)

(Indentation)

> +{
> +    bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, nperm,
> +                              nshared);
> +
> +    if (role == &child_file) {
> +        /*
> +         * share write to target, to not interfere guest writes to it's disk

*interfere with guest writes to its disk

> +         * which will be in target backing chain
> +         */
> +        *nshared = *nshared | BLK_PERM_WRITE;
> +        *nperm = *nperm | BLK_PERM_WRITE;

Why do you need to take the write permission?  This filter does not
perform any writes unless it receives writes, right?  (So
bdrv_filter_default_perms() should take care of this.)

> +    } else {
> +        *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
> +    }
> +}
> +
> +BlockDriver bdrv_backup_top_filter = {
> +    .format_name = "backup-top",
> +    .instance_size = sizeof(BDRVBackupTopState),
> +
> +    .bdrv_co_preadv             = backup_top_co_preadv,
> +    .bdrv_co_pwritev            = backup_top_co_pwritev,
> +    .bdrv_co_pwrite_zeroes      = backup_top_co_pwrite_zeroes,
> +    .bdrv_co_pdiscard           = backup_top_co_pdiscard,
> +    .bdrv_co_flush              = backup_top_co_flush,
> +
> +    .bdrv_co_block_status       = bdrv_co_block_status_from_backing,
> +
> +    .bdrv_refresh_filename      = backup_top_refresh_filename,
> +
> +    .bdrv_child_perm            = backup_top_child_perm,
> +
> +    .is_filter = true,
> +};
> +
> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
> +                                         BlockDriverState *target,
> +                                         HBitmap *copy_bitmap,
> +                                         Error **errp)
> +{
> +    Error *local_err = NULL;
> +    BDRVBackupTopState *state;
> +    BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter,
> +                                                 NULL, BDRV_O_RDWR, errp);
> +
> +    if (!top) {
> +        return NULL;
> +    }
> +
> +    top->implicit = true;
> +    top->total_sectors = source->total_sectors;
> +    top->opaque = state = g_new0(BDRVBackupTopState, 1);
> +    state->copy_bitmap = copy_bitmap;
> +
> +    bdrv_ref(target);
> +    state->target = bdrv_attach_child(top, target, "target", &child_file, errp);
> +    if (!state->target) {
> +        bdrv_unref(target);
> +        bdrv_unref(top);
> +        return NULL;
> +    }
> +
> +    bdrv_set_aio_context(top, bdrv_get_aio_context(source));
> +    bdrv_set_aio_context(target, bdrv_get_aio_context(source));
> +
> +    bdrv_drained_begin(source);
> +
> +    bdrv_ref(top);
> +    bdrv_append(top, source, &local_err);
> +
> +    if (local_err) {
> +        bdrv_unref(top);

This is done automatically by bdrv_append().

> +    }
> +
> +    bdrv_drained_end(source);
> +
> +    if (local_err) {
> +        bdrv_unref_child(top, state->target);
> +        bdrv_unref(top);
> +        error_propagate(errp, local_err);
> +        return NULL;
> +    }
> +
> +    return top;
> +}
> +
> +void bdrv_backup_top_drop(BlockDriverState *bs)
> +{
> +    BDRVBackupTopState *s = bs->opaque;
> +
> +    AioContext *aio_context = bdrv_get_aio_context(bs);
> +
> +    aio_context_acquire(aio_context);
> +
> +    bdrv_drained_begin(bs);
> +
> +    bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, &error_abort);
> +    bdrv_replace_node(bs, backing_bs(bs), &error_abort);
> +    bdrv_set_backing_hd(bs, NULL, &error_abort);

This is done automatically in bdrv_close(), and after bs has been
replaced by backing_bs(bs), I don't think new requests should come in,
so I don't think this needs to be done here.

> +
> +    bdrv_drained_end(bs);
> +
> +    if (s->target) {
> +        bdrv_unref_child(bs, s->target);
> +    }

And this should be done in a .bdrv_close() implementation, I think.

Max

> +    bdrv_unref(bs);
> +
> +    aio_context_release(aio_context);
> +}
> +
> +uint64_t bdrv_backup_top_progress(BlockDriverState *bs)
> +{
> +    BDRVBackupTopState *s = bs->opaque;
> +
> +    return s->bytes_copied;
> +}
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 7a81892a52..9072115c09 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -40,6 +40,8 @@ block-obj-y += throttle.o copy-on-read.o
>  
>  block-obj-y += crypto.o
>  
> +block-obj-y += backup-top.o
> +
>  common-obj-y += stream.o
>  
>  nfs.o-libs         := $(LIBNFS_LIBS)
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests Vladimir Sementsov-Ogievskiy
@ 2019-01-16 16:18   ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-16 16:18 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 434 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Split out do_wait_serialising_requests with additional possibility to
> not actually wait but just check, that there is something to wait for.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/io.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions Vladimir Sementsov-Ogievskiy
@ 2019-01-16 16:36   ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-16 16:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 584 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> From: Vladimir Sementsov-Ogievskiy <etendren@gmail.com>
> 
> Introduce lock/unlock range functionality, based on serialized
> requests. This is needed to refactor backup, dropping local
> tracked-request-like synchronization.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <etendren@gmail.com>
> ---
>  include/block/block_int.h |  4 ++++
>  block/io.c                | 45 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 48 insertions(+), 1 deletion(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
  2019-01-16 16:02   ` Max Reitz
@ 2019-01-17 12:13     ` Vladimir Sementsov-Ogievskiy
  2019-01-18 12:05       ` Max Reitz
  2019-04-13 16:08       ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-17 12:13 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

16.01.2019 19:02, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Backup-top filter does copy-before-write operation. It should be
>> inserted above active disk and has a target node for CBW, like the
>> following:
>>
>>      +-------+
>>      | Guest |
>>      +---+---+
>>          |r,w
>>          v
>>      +---+-----------+  target   +---------------+
>>      | backup_top    |---------->| target(qcow2) |
>>      +---+-----------+   CBW     +---+-----------+
>>          |
>> backing |r,w
>>          v
>>      +---+---------+
>>      | Active disk |
>>      +-------------+
>>
>> The driver will be used in backup instead of write-notifiers.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/backup-top.h  |  43 +++++++
>>   block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>>   block/Makefile.objs |   2 +
>>   3 files changed, 351 insertions(+)
>>   create mode 100644 block/backup-top.h
>>   create mode 100644 block/backup-top.c
> 
> Looks good to me overall, comments below:
> 

[..]

 >> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
 >> +                                          uint64_t bytes)

[..]

>> +
>> +    /*
>> +     * Features to be implemented:
>> +     * F3. parallelize copying loop
>> +     * F4. detect zeros
> 
> Isn't there detect-zeroes for that?

Hmm interesting note. I've placed F4 here, as we do have detecting zeros by hand in
current backup code. Should we drop it from backup, or switch to just enabling
detect-zeros on target?

> 
>> +     * F5. use block_status ?
>> +     * F6. don't copy clusters which are already cached by COR [see F1]
>> +     * F7. if target is ram-cache and it is full, there should be a possibility
>> +     *     to drop not necessary data (cached by COR [see F1]) to handle CBW
>> +     *     fast.
> 
> Another idea: Read all dirty data from bs->backing (in parallel, more or
> less), and then perform all writes (to bs->backing and s->target) in
> parallel as well.
> 
> (So the writes do not have to wait on one another)

Yes, we can continue guest write after read part of CBW, but we can't do it always,
as we want to limit RAM usage to store all these in-flight requests. And I think,
in scheme with ram-cache, ram-cache itself should be actually a kind of storage
for in-flight requests. And in this terminology, if ram-cache is full, we can't
proceed with guest write. It's the same as we reached limit on in-flight requests
count.

[..]

>> +
>> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
>> +                                                  int64_t offset, int bytes)
> 
> (Indentation)

looks like it was after renaming
fleecing_hook
to
backup_top

will fix all

[..]

>> +
>> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>> +{
>> +    if (!bs->backing) {
>> +        return 0;
>> +    }
>> +
>> +    return bdrv_co_flush(bs->backing->bs);
> 
> Why not the target as well?

Should we? It may be guest flush, why to flush backup target on it?

[..]

>> +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>> +                                       const BdrvChildRole *role,
>> +                                       BlockReopenQueue *reopen_queue,
>> +                                       uint64_t perm, uint64_t shared,
>> +                                       uint64_t *nperm, uint64_t *nshared)
> 
> (Indentation)
> 
>> +{
>> +    bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, nperm,
>> +                              nshared);
>> +
>> +    if (role == &child_file) {
>> +        /*
>> +         * share write to target, to not interfere guest writes to it's disk
> 
> *interfere with guest writes to its disk
> 
>> +         * which will be in target backing chain
>> +         */
>> +        *nshared = *nshared | BLK_PERM_WRITE;
>> +        *nperm = *nperm | BLK_PERM_WRITE;
> 
> Why do you need to take the write permission?  This filter does not
> perform any writes unless it receives writes, right?  (So
> bdrv_filter_default_perms() should take care of this.)

Your commit shed a bit of light on permission system for me) Ok, I'll check, if
that work without PERM_WRITE here.

> 
>> +    } else {
>> +        *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
>> +    }
>> +}
>> +

Thank you! For other comments: argee or will-try


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
  2019-01-17 12:13     ` Vladimir Sementsov-Ogievskiy
@ 2019-01-18 12:05       ` Max Reitz
  2019-01-23 13:47         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-18 12:05 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 5248 bytes --]

On 17.01.19 13:13, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 19:02, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Backup-top filter does copy-before-write operation. It should be
>>> inserted above active disk and has a target node for CBW, like the
>>> following:
>>>
>>>      +-------+
>>>      | Guest |
>>>      +---+---+
>>>          |r,w
>>>          v
>>>      +---+-----------+  target   +---------------+
>>>      | backup_top    |---------->| target(qcow2) |
>>>      +---+-----------+   CBW     +---+-----------+
>>>          |
>>> backing |r,w
>>>          v
>>>      +---+---------+
>>>      | Active disk |
>>>      +-------------+
>>>
>>> The driver will be used in backup instead of write-notifiers.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/backup-top.h  |  43 +++++++
>>>   block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>>>   block/Makefile.objs |   2 +
>>>   3 files changed, 351 insertions(+)
>>>   create mode 100644 block/backup-top.h
>>>   create mode 100644 block/backup-top.c
>>
>> Looks good to me overall, comments below:
>>
> 
> [..]
> 
>  >> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
>  >> +                                          uint64_t bytes)
> 
> [..]
> 
>>> +
>>> +    /*
>>> +     * Features to be implemented:
>>> +     * F3. parallelize copying loop
>>> +     * F4. detect zeros
>>
>> Isn't there detect-zeroes for that?
> 
> Hmm interesting note. I've placed F4 here, as we do have detecting zeros by hand in
> current backup code. Should we drop it from backup, or switch to just enabling
> detect-zeros on target?

Off the top of my head I don't know a reason why we wouldn't just set
detect-zeroes.  Maybe because there may be other writers to target which
should not use that setting?  But that scenario just seems wrong to me.

>>> +     * F5. use block_status ?
>>> +     * F6. don't copy clusters which are already cached by COR [see F1]
>>> +     * F7. if target is ram-cache and it is full, there should be a possibility
>>> +     *     to drop not necessary data (cached by COR [see F1]) to handle CBW
>>> +     *     fast.
>>
>> Another idea: Read all dirty data from bs->backing (in parallel, more or
>> less), and then perform all writes (to bs->backing and s->target) in
>> parallel as well.
>>
>> (So the writes do not have to wait on one another)
> 
> Yes, we can continue guest write after read part of CBW, but we can't do it always,
> as we want to limit RAM usage to store all these in-flight requests.

But this code here already allocates a buffer to cover all of the area.

> And I think,
> in scheme with ram-cache, ram-cache itself should be actually a kind of storage
> for in-flight requests. And in this terminology, if ram-cache is full, we can't
> proceed with guest write. It's the same as we reached limit on in-flight requests
> count.
> 
> [..]
> 
>>> +
>>> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
>>> +                                                  int64_t offset, int bytes)
>>
>> (Indentation)
> 
> looks like it was after renaming
> fleecing_hook
> to
> backup_top
> 
> will fix all

Ah :-)

> [..]
> 
>>> +
>>> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>>> +{
>>> +    if (!bs->backing) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    return bdrv_co_flush(bs->backing->bs);
>>
>> Why not the target as well?
> 
> Should we? It may be guest flush, why to flush backup target on it?

Hm...  Well, the current backup code didn't flush the target, and the
mirror filter doesn't either.  OK then.

Max

> [..]
> 
>>> +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>>> +                                       const BdrvChildRole *role,
>>> +                                       BlockReopenQueue *reopen_queue,
>>> +                                       uint64_t perm, uint64_t shared,
>>> +                                       uint64_t *nperm, uint64_t *nshared)
>>
>> (Indentation)
>>
>>> +{
>>> +    bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, nperm,
>>> +                              nshared);
>>> +
>>> +    if (role == &child_file) {
>>> +        /*
>>> +         * share write to target, to not interfere guest writes to it's disk
>>
>> *interfere with guest writes to its disk
>>
>>> +         * which will be in target backing chain
>>> +         */
>>> +        *nshared = *nshared | BLK_PERM_WRITE;
>>> +        *nperm = *nperm | BLK_PERM_WRITE;
>>
>> Why do you need to take the write permission?  This filter does not
>> perform any writes unless it receives writes, right?  (So
>> bdrv_filter_default_perms() should take care of this.)
> 
> Your commit shed a bit of light on permission system for me) Ok, I'll check, if
> that work without PERM_WRITE here.
> 
>>
>>> +    } else {
>>> +        *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
>>> +    }
>>> +}
>>> +
> 
> Thank you! For other comments: argee or will-try
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create Vladimir Sementsov-Ogievskiy
@ 2019-01-18 13:00   ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-18 13:00 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Move copy-bitmap find/create code. It's needed for the following
> commit, as we'll need copy_bitmap before actual block job creation. Do
> it in a separate commit to simplify review.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 69 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 40 insertions(+), 29 deletions(-)

Reviewed-by: Max Reitz <mreitz@redhat.com>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers Vladimir Sementsov-Ogievskiy
@ 2019-01-18 14:56   ` Max Reitz
  2019-01-28 11:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-18 14:56 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, den, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 12784 bytes --]

On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> Drop write notifiers and use filter node instead. Changes:
> 
> 1. copy-before-writes now handled by filter node, so, drop all
>    is_write_notifier arguments.
> 
> 2. we don't have intersecting requests, so their handling is dropped.
> Instead, synchronization works as follows:
> when backup or backup-top starts copying of some area it firstly
> clears copy-bitmap bits, and nobody touches areas, not marked with
> dirty bits in copy-bitmap, so there is no intersection. Also, backup
> job copy operations are surrounded by bdrv region lock, which is
> actually serializing request, to not interfer with guest writes and
> not read changed data from source (before reading we clear
> corresponding bit in copy-bitmap, so, this area is not more handled by
> backup-top).
> 
> 3. To sync with in-flight requests we now just drain hook node, we
> don't need rw-lock.
> 
> 4. After the whole backup loop (top, full, incremental modes), we need
> to check for not copied clusters, which were under backup-top operation
> and we skipped them, but backup-top operation failed, error returned to
> the guest and dirty bits set back.
> 
> 5. Don't create additional blk, use backup-top children for copy
> operations.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 149 insertions(+), 136 deletions(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index 88c0242b4e..e332909fb7 100644
> --- a/block/backup.c
> +++ b/block/backup.c

[...]

> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
>  static void backup_clean(Job *job)
>  {
>      BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> -    assert(s->target);
> -    blk_unref(s->target);
> +
> +    /* We must clean it to not crash in backup_drain. */
>      s->target = NULL;

Why not set s->source to NULL along with it?  It makes sense if you're
going to drop the backup-top node because both of these are its children.

>  
>      if (s->copy_bitmap) {
>          hbitmap_free(s->copy_bitmap);
>          s->copy_bitmap = NULL;
>      }
> +
> +    bdrv_backup_top_drop(s->backup_top);
>  }

[...]

> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>      bool error_is_read;
>      int64_t offset;
>      HBitmapIter hbi;
> +    void *lock = NULL;
>  
>      hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> -    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
> +    while (hbitmap_count(job->copy_bitmap)) {
> +        offset = hbitmap_iter_next(&hbi);
> +        if (offset == -1) {
> +            /*
> +             * we may have skipped some clusters, which were handled by
> +             * backup-top, but failed and finished by returning error to
> +             * the guest and set dirty bit back.
> +             */
> +            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> +            offset = hbitmap_iter_next(&hbi);
> +            assert(offset);

I think you want to assert "offset >= 0".

> +        }
> +
> +        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
> +        /*
> +         * Dirty bit is set, which means that there are no in-flight
> +         * write requests on this area. We must succeed.
> +         */
> +        assert(lock);

I'm not sure that is true right now, but more on that below in backup_run().

> +
>          do {
>              if (yield_and_check(job)) {
> +                bdrv_co_unlock(lock);
>                  return 0;
>              }
> -            ret = backup_do_cow(job, offset,
> -                                job->cluster_size, &error_is_read, false);
> +            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
>              if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
>                             BLOCK_ERROR_ACTION_REPORT)
>              {
> +                bdrv_co_unlock(lock);
>                  return ret;
>              }
>          } while (ret < 0);
> +
> +        bdrv_co_unlock(lock);
> +        lock = NULL;

This statement seems unnecessary here.

>      }
>  
>      return 0;

[...]

> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>          hbitmap_set(s->copy_bitmap, 0, s->len);
>      }
>  
> -    s->before_write.notify = backup_before_write_notify;
> -    bdrv_add_before_write_notifier(bs, &s->before_write);
> -
>      if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
>          /* All bits are set in copy_bitmap to allow any cluster to be copied.
>           * This does not actually require them to be copied. */
>          while (!job_is_cancelled(job)) {
> -            /* Yield until the job is cancelled.  We just let our before_write
> -             * notify callback service CoW requests. */
> +            /*
> +             * Yield until the job is cancelled.  We just let our backup-top
> +             * fileter driver service CbW requests.

*filter

> +             */
>              job_yield(job);
>          }
>      } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>          ret = backup_run_incremental(s);
>      } else {

[...]

> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>              if (alloced < 0) {
>                  ret = alloced;
>              } else {
> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
> +                    trace_backup_do_cow_skip(job, offset);
> +                    continue; /* already copied */
> +                }
> +                if (!lock) {
> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
> +                    /*
> +                     * Dirty bit is set, which means that there are no in-flight
> +                     * write requests on this area. We must succeed.
> +                     */
> +                    assert(lock);

What if I have a different parent node for the source that issues
concurrent writes?  This used to work fine because the before_write
notifier would still work.  After this patch, that would be broken
because those writes would not cause a CbW.

That's not so bad because we just have to make sure that all writes go
through the backup-top node.  That would make this assertion valid
again, too.  But that means we cannot share PERM_WRITE; see [1].

> +                }
>                  ret = backup_do_cow(s, offset, s->cluster_size,
> -                                    &error_is_read, false);
> +                                    &error_is_read);
>              }
>              if (ret < 0) {
>                  /* Depending on error action, fail now or retry cluster */
> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>                      break;
>                  } else {
>                      offset -= s->cluster_size;
> +                    retry = true;
>                      continue;
>                  }
>              }
>          }
> +        if (lock) {
> +            bdrv_co_unlock(lock);
> +            lock = NULL;
> +        }
> +        if (ret == 0 && !job_is_cancelled(job) &&
> +            hbitmap_count(s->copy_bitmap))
> +        {
> +            /*
> +             * we may have skipped some clusters, which were handled by
> +             * backup-top, but failed and finished by returning error to
> +             * the guest and set dirty bit back.

So it's a matter of a race?

> +             */
> +            goto iteration;
> +        }

Why not wrap everything in a do {} while (ret == 0 && !job_is...)
instead?  Because it would mean reindenting everything?

>      }
>  
> -    notifier_with_return_remove(&s->before_write);
> +    /* wait pending CBW operations in backup-top */
> +    bdrv_drain(s->backup_top);
>  
> -    /* wait until pending backup_do_cow() calls have completed */
> -    qemu_co_rwlock_wrlock(&s->flush_rwlock);
> -    qemu_co_rwlock_unlock(&s->flush_rwlock);
> +    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
> +    job_progress_update(job, ret + backup_top_progress -

Why the "ret"?

> +                        s->backup_top_progress);
> +    s->backup_top_progress = backup_top_progress;

So the backup-top progress is ignored during basically all of the block
job until it suddenly jumps to 100 % completion?  That doesn't sound ideal.

Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
 (And the while() loop of MODE_NONE)

>  
>      return ret;
>  }
> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>      int ret;
>      int64_t cluster_size;
>      HBitmap *copy_bitmap = NULL;
> +    BlockDriverState *backup_top = NULL;
> +    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> +                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;

BLK_PERM_ALL & ~BLK_PERM_RESIZE?

[1] But we probably do not want to share either PERM_WRITE or
PERM_WRITE_UNCHANGED because during the duration of the backup,
everything should go through the backup-top filter (not sure about
PERM_WRITE_UNCHANGED right now).  Or is that something that the filter
node should enforce in backup_top_child_perm()?

>  
>      assert(bs);
>      assert(target);
> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>  
>      copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>  
> -    /* job->len is fixed, so we can't allow resize */
> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
> -                           BLK_PERM_CONSISTENT_READ,
> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
> -                           speed, creation_flags, cb, opaque, errp);
> -    if (!job) {
> +    /*
> +     * bdrv_get_device_name will not help to find device name starting from
> +     * @bs after backup-top append,

Why not?  Since backup-top is appended, shouldn't all parents of @bs be
parents of @backup_top then?  (Making bdrv_get_parent_name() return the
same result)

>                                      so let's calculate job_id before. Do
> +     * it in the same way like block_job_create
> +     */
> +    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
> +        job_id = bdrv_get_device_name(bs);
> +    }
> +
> +    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
> +    if (!backup_top) {
>          goto error;
>      }
>  
> -    /* The target must match the source in size, so no resize here either */
> -    job->target = blk_new(BLK_PERM_WRITE,
> -                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> -                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
> -    ret = blk_insert_bs(job->target, target, errp);
> -    if (ret < 0) {
> +    /* job->len is fixed, so we can't allow resize */
> +    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,

Is there a reason you dropped PERM_CONSISTENT_READ here?

> +                           all_except_resize, speed, creation_flags,
> +                           cb, opaque, errp);
> +    if (!job) {
>          goto error;
>      }
>  
> +    job->source = backup_top->backing;
> +    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;

This looks really ugly.  I think as long as the block job performs
writes itself, it should use its own BlockBackend.

Alternatively, I think it would make sense for the backup-top filter to
offer functions that this job can use to issue writes to the target.
Then the backup job would no longer need direct access to the target as
a BdrvChild.

> +
>      job->on_source_error = on_source_error;
>      job->on_target_error = on_target_error;
>      job->sync_mode = sync_mode;

[...]

> @@ -712,6 +722,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>          backup_clean(&job->common.job);
>          job_early_fail(&job->common.job);
>      }
> +    if (backup_top) {
> +        bdrv_backup_top_drop(backup_top);
> +    }

While it shouldn't be a problem in practice, backup_top has a reference
to copy_bitmap, so that bitmap should be freed after backup_top is dropped.

Max

>  
>      return NULL;
>  }
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-16 13:05           ` Max Reitz
@ 2019-01-23  8:20             ` Vladimir Sementsov-Ogievskiy
  2019-01-23 13:19               ` Max Reitz
  2019-01-23 14:36               ` Eric Blake
  0 siblings, 2 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-23  8:20 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

16.01.2019 16:05, Max Reitz wrote:
> On 14.01.19 15:48, Vladimir Sementsov-Ogievskiy wrote:
>> 14.01.2019 17:13, Max Reitz wrote:
>>> On 14.01.19 15:01, Vladimir Sementsov-Ogievskiy wrote:
>>>> 14.01.2019 16:10, Max Reitz wrote:
>>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Simplify backup_incremental_init_copy_bitmap using the function
>>>>>> bdrv_dirty_bitmap_next_dirty_area.
>>>>>>
>>>>>> Note: move to job->len instead of bitmap size: it should not matter but
>>>>>> less code.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>     block/backup.c | 40 ++++++++++++----------------------------
>>>>>>     1 file changed, 12 insertions(+), 28 deletions(-)
>>>>>
>>>>> Overall: What is this function even supposed to do?  To me, it looks
>>>>> like it marks all areas in job->copy_bitmap dirty that are dirty in
>>>>> job->sync_bitmap.
>>>>>
>>>>> If so, wouldn't just replacing this by hbitmap_merge() simplify things
>>>>> further?
>>>>>
>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>> index 435414e964..fbe7ce19e1 100644
>>>>>> --- a/block/backup.c
>>>>>> +++ b/block/backup.c
>>>>>> @@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>>>>
>>>>> [...]
>>>>>
>>>>>> +    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
>>>>>> +                                             &offset, &bytes))
>>>>>> +    {
>>>>>> +        uint64_t cluster = offset / job->cluster_size;
>>>>>> +        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
>>>>>>     
>>>>>> -        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
>>>>>> -        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
>>>>>> -        if (next_cluster >= end) {
>>>>>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
>>>>>
>>>>> Why the +1?  Shouldn't the division for last_cluster round up instead?
>>>>>
>>>>>> +
>>>>>> +        offset = (last_cluster + 1) * job->cluster_size;
>>>>>
>>>>> Same here.
>>>>
>>>> last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.
>>>>
>>>> If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?
>>>
>>> Doesn't bdrv_dirty_bitmap_next_dirty_area() return a range [offset,
>>> offset + bytes), i.e. where "offset + bytes" is the first clean offset?
>>
>> oops, you are right. then I need
>> uint64_t last_cluster = (offset + bytes - 1) / job->cluster_size;
> 
> That, or you just use a rounding up division and rename it from
> last_cluster to end_cluster or first_clean_cluster or something (and
> subsequently drop the +1s).

This will not work, as ((offset + bytes) / job->cluster_size) is not first clean cluster
or end cluster. It's a cluster, where is first clean bit located, but it may have dirty
bits too (or, may not).

So, to rewrite based on end_cluster, it should be calculated as

(offset + bytes - 1) / job->cluster_size + 1

and, I'm going to do so, one "+1" instead of two, and, may be, a bit more understandable.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-23  8:20             ` Vladimir Sementsov-Ogievskiy
@ 2019-01-23 13:19               ` Max Reitz
  2019-01-23 14:36               ` Eric Blake
  1 sibling, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-23 13:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 3552 bytes --]

On 23.01.19 09:20, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 16:05, Max Reitz wrote:
>> On 14.01.19 15:48, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.01.2019 17:13, Max Reitz wrote:
>>>> On 14.01.19 15:01, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 14.01.2019 16:10, Max Reitz wrote:
>>>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> Simplify backup_incremental_init_copy_bitmap using the function
>>>>>>> bdrv_dirty_bitmap_next_dirty_area.
>>>>>>>
>>>>>>> Note: move to job->len instead of bitmap size: it should not matter but
>>>>>>> less code.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>> ---
>>>>>>>     block/backup.c | 40 ++++++++++++----------------------------
>>>>>>>     1 file changed, 12 insertions(+), 28 deletions(-)
>>>>>>
>>>>>> Overall: What is this function even supposed to do?  To me, it looks
>>>>>> like it marks all areas in job->copy_bitmap dirty that are dirty in
>>>>>> job->sync_bitmap.
>>>>>>
>>>>>> If so, wouldn't just replacing this by hbitmap_merge() simplify things
>>>>>> further?
>>>>>>
>>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>>> index 435414e964..fbe7ce19e1 100644
>>>>>>> --- a/block/backup.c
>>>>>>> +++ b/block/backup.c
>>>>>>> @@ -406,43 +406,27 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> +    while (bdrv_dirty_bitmap_next_dirty_area(job->sync_bitmap,
>>>>>>> +                                             &offset, &bytes))
>>>>>>> +    {
>>>>>>> +        uint64_t cluster = offset / job->cluster_size;
>>>>>>> +        uint64_t last_cluster = (offset + bytes) / job->cluster_size;
>>>>>>>     
>>>>>>> -        next_cluster = DIV_ROUND_UP(offset, job->cluster_size);
>>>>>>> -        hbitmap_set(job->copy_bitmap, cluster, next_cluster - cluster);
>>>>>>> -        if (next_cluster >= end) {
>>>>>>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
>>>>>>
>>>>>> Why the +1?  Shouldn't the division for last_cluster round up instead?
>>>>>>
>>>>>>> +
>>>>>>> +        offset = (last_cluster + 1) * job->cluster_size;
>>>>>>
>>>>>> Same here.
>>>>>
>>>>> last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.
>>>>>
>>>>> If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?
>>>>
>>>> Doesn't bdrv_dirty_bitmap_next_dirty_area() return a range [offset,
>>>> offset + bytes), i.e. where "offset + bytes" is the first clean offset?
>>>
>>> oops, you are right. then I need
>>> uint64_t last_cluster = (offset + bytes - 1) / job->cluster_size;
>>
>> That, or you just use a rounding up division and rename it from
>> last_cluster to end_cluster or first_clean_cluster or something (and
>> subsequently drop the +1s).
> 
> This will not work, as ((offset + bytes) / job->cluster_size) is not first clean cluster
> or end cluster. It's a cluster, where is first clean bit located, but it may have dirty
> bits too (or, may not).
> 
> So, to rewrite based on end_cluster, it should be calculated as
> 
> (offset + bytes - 1) / job->cluster_size + 1

That's why I asked "Shouldn't the division for last_cluster round up
instead?"

Max

> and, I'm going to do so, one "+1" instead of two, and, may be, a bit more understandable.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name
  2019-01-16 13:11       ` Max Reitz
@ 2019-01-23 13:22         ` Vladimir Sementsov-Ogievskiy
  2019-01-23 13:31           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-23 13:22 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

16.01.2019 16:11, Max Reitz wrote:
> On 14.01.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>> 14.01.2019 17:46, Max Reitz wrote:
>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>> After node graph changes, we may not be able to resume_drive by device
>>>> name (backing files are not recursively searched). So, lets allow to
>>>> resume by node-name. Set constant name for breakpoints, to avoid
>>>> introducing extra parameters.
>>>
>>> Hm, I don't quite understand this reason.  Is this so you can create
>>> breakpoints on one node (which falls through to the first blkdebug node)
>>> and then remove them from another (falling through to the same blkdebug
>>> node)?
>>
>> add/remove breakpoint goes through ->file children, but my filter links
>> active disk as ->backing. So, before block-job start we can insert breakpoint
>> by device name. But then, when filter inserted, we can't remove breakpoint,
>> because my filter hides blkdebug with active disk under ->backing link.
>>
>> Maybe, right solution would be support backing links in bdrv_debug_breakpoint()
>> and bdrv_debug_remove_breakpoint() for the case when there is no file child.
>>
>> But being unsure about right behavior, I've decided to adjust the test.
>>
>> What about just do both  add/remove breakpoint through blkdebug node-name, to
>> make it less weird?
> 
> Yes, I was mostly wondering about the "set constant name for
> breakpoints".  It doesn't seem necessary to me;

It's just to not create extra parameters or something. Current code don't allow
to create several breakpoints in one blkdebug anyway (as @drive is used as
breakpoint name), so there is no reason for different names at all.

I'll improve commit message to make it clean.

if you use node names,
> you could address the blkdebug node directly, I would think.
> 
> Max
> 
>>> Wouldn't it be better to let the user specify the breakpoint name?
>>
>> it's not needed now. with current naming we can have one break-point in device,
>> so we don't need different names.
>>
>>>
>>> Max
>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    tests/qemu-iotests/iotests.py | 9 +++++----
>>>>    1 file changed, 5 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
>>>> index 76877ad584..c9779f432f 100644
>>>> --- a/tests/qemu-iotests/iotests.py
>>>> +++ b/tests/qemu-iotests/iotests.py
>>>> @@ -415,11 +415,11 @@ class VM(qtest.QEMUQtestMachine):
>>>>                self.pause_drive(drive, "write_aio")
>>>>                return
>>>>            self.qmp('human-monitor-command',
>>>> -                    command_line='qemu-io %s "break %s bp_%s"' % (drive, event, drive))
>>>> +                    command_line='qemu-io %s "break %s bp_0"' % (drive, event))
>>>>    
>>>>        def resume_drive(self, drive):
>>>>            self.qmp('human-monitor-command',
>>>> -                    command_line='qemu-io %s "remove_break bp_%s"' % (drive, drive))
>>>> +                    command_line='qemu-io %s "remove_break bp_0"' % (drive))
>>>>    
>>>>        def hmp_qemu_io(self, drive, cmd):
>>>>            '''Write to a given drive using an HMP command'''
>>>> @@ -543,13 +543,14 @@ class QMPTestCase(unittest.TestCase):
>>>>            self.assertEqual(self.vm.flatten_qmp_object(json.loads(json_filename[5:])),
>>>>                             self.vm.flatten_qmp_object(reference))
>>>>    
>>>> -    def cancel_and_wait(self, drive='drive0', force=False, resume=False):
>>>> +    def cancel_and_wait(self, drive='drive0', force=False, resume=False,
>>>> +                        resume_node=None):
>>>>            '''Cancel a block job and wait for it to finish, returning the event'''
>>>>            result = self.vm.qmp('block-job-cancel', device=drive, force=force)
>>>>            self.assert_qmp(result, 'return', {})
>>>>    
>>>>            if resume:
>>>> -            self.vm.resume_drive(drive)
>>>> +            self.vm.resume_drive(resume_node or drive)
>>>>    
>>>>            cancelled = False
>>>>            result = None
>>>>
>>>
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name
  2019-01-23 13:22         ` Vladimir Sementsov-Ogievskiy
@ 2019-01-23 13:31           ` Vladimir Sementsov-Ogievskiy
  2019-01-23 13:33             ` Max Reitz
  0 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-23 13:31 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

23.01.2019 16:22, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 16:11, Max Reitz wrote:
>> On 14.01.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>> 14.01.2019 17:46, Max Reitz wrote:
>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>>> After node graph changes, we may not be able to resume_drive by device
>>>>> name (backing files are not recursively searched). So, lets allow to
>>>>> resume by node-name. Set constant name for breakpoints, to avoid
>>>>> introducing extra parameters.
>>>>
>>>> Hm, I don't quite understand this reason.  Is this so you can create
>>>> breakpoints on one node (which falls through to the first blkdebug node)
>>>> and then remove them from another (falling through to the same blkdebug
>>>> node)?
>>>
>>> add/remove breakpoint goes through ->file children, but my filter links
>>> active disk as ->backing. So, before block-job start we can insert breakpoint
>>> by device name. But then, when filter inserted, we can't remove breakpoint,
>>> because my filter hides blkdebug with active disk under ->backing link.
>>>
>>> Maybe, right solution would be support backing links in bdrv_debug_breakpoint()
>>> and bdrv_debug_remove_breakpoint() for the case when there is no file child.
>>>
>>> But being unsure about right behavior, I've decided to adjust the test.
>>>
>>> What about just do both  add/remove breakpoint through blkdebug node-name, to
>>> make it less weird?
>>
>> Yes, I was mostly wondering about the "set constant name for
>> breakpoints".  It doesn't seem necessary to me;
> 
> It's just to not create extra parameters or something. Current code don't allow
> to create several breakpoints in one blkdebug anyway (as @drive is used as
> breakpoint name), so there is no reason for different names at all.
> 
> I'll improve commit message to make it clean.
> 

Hmmm.

Or, finally, may be better to teach bdrv_debug_breakpoint() and
bdrv_debug_remove_breakpoint() to skip filters with backing child?


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name
  2019-01-23 13:31           ` Vladimir Sementsov-Ogievskiy
@ 2019-01-23 13:33             ` Max Reitz
  0 siblings, 0 replies; 54+ messages in thread
From: Max Reitz @ 2019-01-23 13:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

On 23.01.19 14:31, Vladimir Sementsov-Ogievskiy wrote:
> 23.01.2019 16:22, Vladimir Sementsov-Ogievskiy wrote:
>> 16.01.2019 16:11, Max Reitz wrote:
>>> On 14.01.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>>> 14.01.2019 17:46, Max Reitz wrote:
>>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> After node graph changes, we may not be able to resume_drive by device
>>>>>> name (backing files are not recursively searched). So, lets allow to
>>>>>> resume by node-name. Set constant name for breakpoints, to avoid
>>>>>> introducing extra parameters.
>>>>>
>>>>> Hm, I don't quite understand this reason.  Is this so you can create
>>>>> breakpoints on one node (which falls through to the first blkdebug node)
>>>>> and then remove them from another (falling through to the same blkdebug
>>>>> node)?
>>>>
>>>> add/remove breakpoint goes through ->file children, but my filter links
>>>> active disk as ->backing. So, before block-job start we can insert breakpoint
>>>> by device name. But then, when filter inserted, we can't remove breakpoint,
>>>> because my filter hides blkdebug with active disk under ->backing link.
>>>>
>>>> Maybe, right solution would be support backing links in bdrv_debug_breakpoint()
>>>> and bdrv_debug_remove_breakpoint() for the case when there is no file child.
>>>>
>>>> But being unsure about right behavior, I've decided to adjust the test.
>>>>
>>>> What about just do both  add/remove breakpoint through blkdebug node-name, to
>>>> make it less weird?
>>>
>>> Yes, I was mostly wondering about the "set constant name for
>>> breakpoints".  It doesn't seem necessary to me;
>>
>> It's just to not create extra parameters or something. Current code don't allow
>> to create several breakpoints in one blkdebug anyway (as @drive is used as
>> breakpoint name), so there is no reason for different names at all.
>>
>> I'll improve commit message to make it clean.
>>
> 
> Hmmm.
> 
> Or, finally, may be better to teach bdrv_debug_breakpoint() and
> bdrv_debug_remove_breakpoint() to skip filters with backing child?

I suppose that would be a question for my "Deal with filters" series.
It does make sense to me, though.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
  2019-01-18 12:05       ` Max Reitz
@ 2019-01-23 13:47         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-23 13:47 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

18.01.2019 15:05, Max Reitz wrote:
> On 17.01.19 13:13, Vladimir Sementsov-Ogievskiy wrote:
>> 16.01.2019 19:02, Max Reitz wrote:
>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>> Backup-top filter does copy-before-write operation. It should be
>>>> inserted above active disk and has a target node for CBW, like the
>>>> following:
>>>>
>>>>       +-------+
>>>>       | Guest |
>>>>       +---+---+
>>>>           |r,w
>>>>           v
>>>>       +---+-----------+  target   +---------------+
>>>>       | backup_top    |---------->| target(qcow2) |
>>>>       +---+-----------+   CBW     +---+-----------+
>>>>           |
>>>> backing |r,w
>>>>           v
>>>>       +---+---------+
>>>>       | Active disk |
>>>>       +-------------+
>>>>
>>>> The driver will be used in backup instead of write-notifiers.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/backup-top.h  |  43 +++++++
>>>>    block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>>>>    block/Makefile.objs |   2 +
>>>>    3 files changed, 351 insertions(+)
>>>>    create mode 100644 block/backup-top.h
>>>>    create mode 100644 block/backup-top.c
>>>
>>> Looks good to me overall, comments below:
>>>
>>
>> [..]
>>
>>   >> +static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
>>   >> +                                          uint64_t bytes)
>>
>> [..]
>>
>>>> +
>>>> +    /*
>>>> +     * Features to be implemented:
>>>> +     * F3. parallelize copying loop
>>>> +     * F4. detect zeros
>>>
>>> Isn't there detect-zeroes for that?
>>
>> Hmm interesting note. I've placed F4 here, as we do have detecting zeros by hand in
>> current backup code. Should we drop it from backup, or switch to just enabling
>> detect-zeros on target?
> 
> Off the top of my head I don't know a reason why we wouldn't just set
> detect-zeroes.  Maybe because there may be other writers to target which
> should not use that setting?  But that scenario just seems wrong to me.
> 
>>>> +     * F5. use block_status ?
>>>> +     * F6. don't copy clusters which are already cached by COR [see F1]
>>>> +     * F7. if target is ram-cache and it is full, there should be a possibility
>>>> +     *     to drop not necessary data (cached by COR [see F1]) to handle CBW
>>>> +     *     fast.
>>>
>>> Another idea: Read all dirty data from bs->backing (in parallel, more or
>>> less), and then perform all writes (to bs->backing and s->target) in
>>> parallel as well.
>>>
>>> (So the writes do not have to wait on one another)
>>
>> Yes, we can continue guest write after read part of CBW, but we can't do it always,
>> as we want to limit RAM usage to store all these in-flight requests.
> 
> But this code here already allocates a buffer to cover all of the area.

Yes, like current backup do on write notifier. Yes, we don't have exact RAM-limit, and
possibly it should be fixed. But we have at least guarantee, that we don't have more
CBW requests than guest in-flight requests. And we can hope, that in bad case with slow
backup target and high enough write-load in guest, our backup will slow down guest writes,
and guest will not create more and more parallel requests, until the older ones finished.
Hmm, and I think we have limit for parallel guest requests.

But if we continue guest write after old data read into RAM buffer without any limit, it
will lead to uncontrolled growth of RAM usage.

> 
>> And I think,
>> in scheme with ram-cache, ram-cache itself should be actually a kind of storage
>> for in-flight requests. And in this terminology, if ram-cache is full, we can't
>> proceed with guest write. It's the same as we reached limit on in-flight requests
>> count.
>>
>> [..]
>>
>>>> +
>>>> +static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
>>>> +                                                  int64_t offset, int bytes)
>>>
>>> (Indentation)
>>
>> looks like it was after renaming
>> fleecing_hook
>> to
>> backup_top
>>
>> will fix all
> 
> Ah :-)
> 
>> [..]
>>
>>>> +
>>>> +static int coroutine_fn backup_top_co_flush(BlockDriverState *bs)
>>>> +{
>>>> +    if (!bs->backing) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    return bdrv_co_flush(bs->backing->bs);
>>>
>>> Why not the target as well?
>>
>> Should we? It may be guest flush, why to flush backup target on it?
> 
> Hm...  Well, the current backup code didn't flush the target, and the
> mirror filter doesn't either.  OK then.
> 
> Max
> 
>> [..]
>>
>>>> +static void backup_top_child_perm(BlockDriverState *bs, BdrvChild *c,
>>>> +                                       const BdrvChildRole *role,
>>>> +                                       BlockReopenQueue *reopen_queue,
>>>> +                                       uint64_t perm, uint64_t shared,
>>>> +                                       uint64_t *nperm, uint64_t *nshared)
>>>
>>> (Indentation)
>>>
>>>> +{
>>>> +    bdrv_filter_default_perms(bs, c, role, reopen_queue, perm, shared, nperm,
>>>> +                              nshared);
>>>> +
>>>> +    if (role == &child_file) {
>>>> +        /*
>>>> +         * share write to target, to not interfere guest writes to it's disk
>>>
>>> *interfere with guest writes to its disk
>>>
>>>> +         * which will be in target backing chain
>>>> +         */
>>>> +        *nshared = *nshared | BLK_PERM_WRITE;
>>>> +        *nperm = *nperm | BLK_PERM_WRITE;
>>>
>>> Why do you need to take the write permission?  This filter does not
>>> perform any writes unless it receives writes, right?  (So
>>> bdrv_filter_default_perms() should take care of this.)
>>
>> Your commit shed a bit of light on permission system for me) Ok, I'll check, if
>> that work without PERM_WRITE here.
>>
>>>
>>>> +    } else {
>>>> +        *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
>>>> +    }
>>>> +}
>>>> +
>>
>> Thank you! For other comments: argee or will-try
>>
>>
> 
> 


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-23  8:20             ` Vladimir Sementsov-Ogievskiy
  2019-01-23 13:19               ` Max Reitz
@ 2019-01-23 14:36               ` Eric Blake
  2019-01-24 14:20                 ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 54+ messages in thread
From: Eric Blake @ 2019-01-23 14:36 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, jsnow

[-- Attachment #1: Type: text/plain, Size: 1933 bytes --]

On 1/23/19 2:20 AM, Vladimir Sementsov-Ogievskiy wrote:

>>>>>>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
>>>>>>
>>>>>> Why the +1?  Shouldn't the division for last_cluster round up instead?
>>>>>>
>>>>>>> +
>>>>>>> +        offset = (last_cluster + 1) * job->cluster_size;
>>>>>>
>>>>>> Same here.
>>>>>
>>>>> last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.
>>>>>
>>>>> If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?
>>>>
>>>> Doesn't bdrv_dirty_bitmap_next_dirty_area() return a range [offset,
>>>> offset + bytes), i.e. where "offset + bytes" is the first clean offset?
>>>
>>> oops, you are right. then I need
>>> uint64_t last_cluster = (offset + bytes - 1) / job->cluster_size;
>>
>> That, or you just use a rounding up division and rename it from
>> last_cluster to end_cluster or first_clean_cluster or something (and
>> subsequently drop the +1s).
> 
> This will not work, as ((offset + bytes) / job->cluster_size) is not first clean cluster
> or end cluster. It's a cluster, where is first clean bit located, but it may have dirty
> bits too (or, may not).
> 
> So, to rewrite based on end_cluster, it should be calculated as
> 
> (offset + bytes - 1) / job->cluster_size + 1
> 
> and, I'm going to do so, one "+1" instead of two, and, may be, a bit more understandable.

Better is to use the macros in osdep.h, such as QEMU_ALIGN_UP/DOWN, to
make the intent of the code easier to read than having to closely check
which operation is being performed to see if it makes sense in context.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup
  2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers Vladimir Sementsov-Ogievskiy
@ 2019-01-23 15:26 ` no-reply
  11 siblings, 0 replies; 54+ messages in thread
From: no-reply @ 2019-01-23 15:26 UTC (permalink / raw)
  To: vsementsov; +Cc: fam, qemu-block, qemu-devel

Patchew URL: https://patchew.org/QEMU/20181229122027.42245-1-vsementsov@virtuozzo.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=14
=== TEST SCRIPT END ===

  CC      scsi/pr-manager-stub.o
  CC      block/curl.o
/tmp/qemu-test/src/block/sheepdog.c: In function 'find_vdi_name':
/tmp/qemu-test/src/block/sheepdog.c:1239:5: error: 'strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation]
     strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors


The full log is available at
http://patchew.org/logs/20181229122027.42245-1-vsementsov@virtuozzo.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap
  2019-01-23 14:36               ` Eric Blake
@ 2019-01-24 14:20                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-24 14:20 UTC (permalink / raw)
  To: Eric Blake, Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, jsnow

23.01.2019 17:36, Eric Blake wrote:
> On 1/23/19 2:20 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>>>>>>> +        hbitmap_set(job->copy_bitmap, cluster, last_cluster - cluster + 1);
>>>>>>>
>>>>>>> Why the +1?  Shouldn't the division for last_cluster round up instead?
>>>>>>>
>>>>>>>> +
>>>>>>>> +        offset = (last_cluster + 1) * job->cluster_size;
>>>>>>>
>>>>>>> Same here.
>>>>>>
>>>>>> last cluster is not "end", but it's last dirty cluster. so number of dirty clusters is last_cluster - cluster + 1, and next offset is calculated through +1 too.
>>>>>>
>>>>>> If I round up division result, I'll get last for most cases, but "end" (next after the last), for the case when offset % job->cluster_size == 0, so, how to use it?
>>>>>
>>>>> Doesn't bdrv_dirty_bitmap_next_dirty_area() return a range [offset,
>>>>> offset + bytes), i.e. where "offset + bytes" is the first clean offset?
>>>>
>>>> oops, you are right. then I need
>>>> uint64_t last_cluster = (offset + bytes - 1) / job->cluster_size;
>>>
>>> That, or you just use a rounding up division and rename it from
>>> last_cluster to end_cluster or first_clean_cluster or something (and
>>> subsequently drop the +1s).
>>
>> This will not work, as ((offset + bytes) / job->cluster_size) is not first clean cluster
>> or end cluster. It's a cluster, where is first clean bit located, but it may have dirty
>> bits too (or, may not).
>>
>> So, to rewrite based on end_cluster, it should be calculated as
>>
>> (offset + bytes - 1) / job->cluster_size + 1
>>
>> and, I'm going to do so, one "+1" instead of two, and, may be, a bit more understandable.
> 
> Better is to use the macros in osdep.h, such as QEMU_ALIGN_UP/DOWN, to
> make the intent of the code easier to read than having to closely check
> which operation is being performed to see if it makes sense in context.
> 

Yes, thanks. Exactly:
uint64_t end_cluster = DIV_ROUND_UP(offset + bytes, job->cluster_size);


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2019-01-18 14:56   ` Max Reitz
@ 2019-01-28 11:29     ` Vladimir Sementsov-Ogievskiy
  2019-01-28 15:59       ` Max Reitz
  0 siblings, 1 reply; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-28 11:29 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

18.01.2019 17:56, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Drop write notifiers and use filter node instead. Changes:
>>
>> 1. copy-before-writes now handled by filter node, so, drop all
>>     is_write_notifier arguments.
>>
>> 2. we don't have intersecting requests, so their handling is dropped.
>> Instead, synchronization works as follows:
>> when backup or backup-top starts copying of some area it firstly
>> clears copy-bitmap bits, and nobody touches areas, not marked with
>> dirty bits in copy-bitmap, so there is no intersection. Also, backup
>> job copy operations are surrounded by bdrv region lock, which is
>> actually serializing request, to not interfer with guest writes and
>> not read changed data from source (before reading we clear
>> corresponding bit in copy-bitmap, so, this area is not more handled by
>> backup-top).
>>
>> 3. To sync with in-flight requests we now just drain hook node, we
>> don't need rw-lock.
>>
>> 4. After the whole backup loop (top, full, incremental modes), we need
>> to check for not copied clusters, which were under backup-top operation
>> and we skipped them, but backup-top operation failed, error returned to
>> the guest and dirty bits set back.
>>
>> 5. Don't create additional blk, use backup-top children for copy
>> operations.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
>>   1 file changed, 149 insertions(+), 136 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 88c0242b4e..e332909fb7 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
> 
> [...]
> 
>> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
>>   static void backup_clean(Job *job)
>>   {
>>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>> -    assert(s->target);
>> -    blk_unref(s->target);
>> +
>> +    /* We must clean it to not crash in backup_drain. */
>>       s->target = NULL;
> 
> Why not set s->source to NULL along with it?  It makes sense if you're
> going to drop the backup-top node because both of these are its children.

agree.

> 
>>   
>>       if (s->copy_bitmap) {
>>           hbitmap_free(s->copy_bitmap);
>>           s->copy_bitmap = NULL;
>>       }
>> +
>> +    bdrv_backup_top_drop(s->backup_top);
>>   }
> 
> [...]
> 
>> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>       bool error_is_read;
>>       int64_t offset;
>>       HBitmapIter hbi;
>> +    void *lock = NULL;
>>   
>>       hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
>> -    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
>> +    while (hbitmap_count(job->copy_bitmap)) {
>> +        offset = hbitmap_iter_next(&hbi);
>> +        if (offset == -1) {
>> +            /*
>> +             * we may have skipped some clusters, which were handled by
>> +             * backup-top, but failed and finished by returning error to
>> +             * the guest and set dirty bit back.
>> +             */
>> +            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
>> +            offset = hbitmap_iter_next(&hbi);
>> +            assert(offset);
> 
> I think you want to assert "offset >= 0".
> 
>> +        }
>> +
>> +        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
>> +        /*
>> +         * Dirty bit is set, which means that there are no in-flight
>> +         * write requests on this area. We must succeed.
>> +         */
>> +        assert(lock);
> 
> I'm not sure that is true right now, but more on that below in backup_run().
> 
>> +
>>           do {
>>               if (yield_and_check(job)) {
>> +                bdrv_co_unlock(lock);
>>                   return 0;
>>               }
>> -            ret = backup_do_cow(job, offset,
>> -                                job->cluster_size, &error_is_read, false);
>> +            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
>>               if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
>>                              BLOCK_ERROR_ACTION_REPORT)
>>               {
>> +                bdrv_co_unlock(lock);
>>                   return ret;
>>               }
>>           } while (ret < 0);
>> +
>> +        bdrv_co_unlock(lock);
>> +        lock = NULL;
> 
> This statement seems unnecessary here.
> 
>>       }
>>   
>>       return 0;
> 
> [...]
> 
>> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>           hbitmap_set(s->copy_bitmap, 0, s->len);
>>       }
>>   
>> -    s->before_write.notify = backup_before_write_notify;
>> -    bdrv_add_before_write_notifier(bs, &s->before_write);
>> -
>>       if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
>>           /* All bits are set in copy_bitmap to allow any cluster to be copied.
>>            * This does not actually require them to be copied. */
>>           while (!job_is_cancelled(job)) {
>> -            /* Yield until the job is cancelled.  We just let our before_write
>> -             * notify callback service CoW requests. */
>> +            /*
>> +             * Yield until the job is cancelled.  We just let our backup-top
>> +             * fileter driver service CbW requests.
> 
> *filter
> 
>> +             */
>>               job_yield(job);
>>           }
>>       } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>>           ret = backup_run_incremental(s);
>>       } else {
> 
> [...]
> 
>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>               if (alloced < 0) {
>>                   ret = alloced;
>>               } else {
>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
>> +                    trace_backup_do_cow_skip(job, offset);
>> +                    continue; /* already copied */
>> +                }
>> +                if (!lock) {
>> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
>> +                    /*
>> +                     * Dirty bit is set, which means that there are no in-flight
>> +                     * write requests on this area. We must succeed.
>> +                     */
>> +                    assert(lock);
> 
> What if I have a different parent node for the source that issues
> concurrent writes?  This used to work fine because the before_write
> notifier would still work.  After this patch, that would be broken
> because those writes would not cause a CbW.

But haw could you have this different parent node? After appending filter,
there should not be such nodes. And I think, during backup it should be
forbidden to append new parents to source, ignoring filter, as it definitely
breaks what filter does. And it applies to other block-job with their filters.
If we appended a filter, we don't want someone other to write omit our filter.

> 
> That's not so bad because we just have to make sure that all writes go
> through the backup-top node.  That would make this assertion valid
> again, too.  But that means we cannot share PERM_WRITE; see [1].

But we don't share PERM_WRITE on source in backup_top, only on target.

> 
>> +                }
>>                   ret = backup_do_cow(s, offset, s->cluster_size,
>> -                                    &error_is_read, false);
>> +                                    &error_is_read);
>>               }
>>               if (ret < 0) {
>>                   /* Depending on error action, fail now or retry cluster */
>> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>                       break;
>>                   } else {
>>                       offset -= s->cluster_size;
>> +                    retry = true;
>>                       continue;
>>                   }
>>               }
>>           }
>> +        if (lock) {
>> +            bdrv_co_unlock(lock);
>> +            lock = NULL;
>> +        }
>> +        if (ret == 0 && !job_is_cancelled(job) &&
>> +            hbitmap_count(s->copy_bitmap))
>> +        {
>> +            /*
>> +             * we may have skipped some clusters, which were handled by
>> +             * backup-top, but failed and finished by returning error to
>> +             * the guest and set dirty bit back.
> 
> So it's a matter of a race?
> 
>> +             */
>> +            goto iteration;
>> +        }
> 
> Why not wrap everything in a do {} while (ret == 0 && !job_is...)
> instead?  Because it would mean reindenting everything?

Don't remember, but assume that yes. And this code is anyway "To be refactored",
I want all FULL/TOP/INCREMENTAL go through the same (mostly) code path.

> 
>>       }
>>   
>> -    notifier_with_return_remove(&s->before_write);
>> +    /* wait pending CBW operations in backup-top */
>> +    bdrv_drain(s->backup_top);
>>   
>> -    /* wait until pending backup_do_cow() calls have completed */
>> -    qemu_co_rwlock_wrlock(&s->flush_rwlock);
>> -    qemu_co_rwlock_unlock(&s->flush_rwlock);
>> +    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
>> +    job_progress_update(job, ret + backup_top_progress -
> 
> Why the "ret"?

oops, it looks like a copy-paste bug ("ret" is reasonable in backup_do_cow())

> 
>> +                        s->backup_top_progress);
>> +    s->backup_top_progress = backup_top_progress;
> 
> So the backup-top progress is ignored during basically all of the block
> job until it suddenly jumps to 100 % completion?  That doesn't sound ideal.
> 
> Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
>   (And the while() loop of MODE_NONE)


It is done in backup_do_cow(), so FULL and TOP are covered.

But you are right that MODE_NONE seems to have a problem about it.. And just updating it
in a while loop would not work, as I doubt that job_yield will return until job finish
or user interaction like pause/continue/cancel..

So now, it looks better to call job_progress_update() from backup_top directly, and drop
this hack.

> 
>>   
>>       return ret;
>>   }
>> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>       int ret;
>>       int64_t cluster_size;
>>       HBitmap *copy_bitmap = NULL;
>> +    BlockDriverState *backup_top = NULL;
>> +    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>> +                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;
> 
> BLK_PERM_ALL & ~BLK_PERM_RESIZE?
> 
> [1] But we probably do not want to share either PERM_WRITE or
> PERM_WRITE_UNCHANGED because during the duration of the backup,
> everything should go through the backup-top filter (not sure about
> PERM_WRITE_UNCHANGED right now).  Or is that something that the filter
> node should enforce in backup_top_child_perm()?

It's not shared perm of backup_top, it's a shared perm of block-job common.blk, which is
used only to "job->len is fixed, so we can't allow resize", so this part is not changed.

So yes, the problem you mean by [1] is about backup_top_child_perm() where we share PERM_WRITE.
And it is described by comment, we must share this write perm, otherwise we break guest writes.
We share PERM_WRITE in backup_top to force its target child share PERM_WRITE on its backing,
as backing of target is source.

But again, we share PERM_WRITE only on target, and it is shared in current code too.

> 
>>   
>>       assert(bs);
>>       assert(target);
>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>   
>>       copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>>   
>> -    /* job->len is fixed, so we can't allow resize */
>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
>> -                           BLK_PERM_CONSISTENT_READ,
>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
>> -                           speed, creation_flags, cb, opaque, errp);
>> -    if (!job) {
>> +    /*
>> +     * bdrv_get_device_name will not help to find device name starting from
>> +     * @bs after backup-top append,
> 
> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
> same result)

bdrv_get_device_name goes finally through role->get_name, and only root role has
this handler. After append we'll have backing role instead of root.

> 
>>                                       so let's calculate job_id before. Do
>> +     * it in the same way like block_job_create
>> +     */
>> +    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
>> +        job_id = bdrv_get_device_name(bs);
>> +    }
>> +
>> +    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
>> +    if (!backup_top) {
>>           goto error;
>>       }
>>   
>> -    /* The target must match the source in size, so no resize here either */
>> -    job->target = blk_new(BLK_PERM_WRITE,
>> -                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>> -                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
>> -    ret = blk_insert_bs(job->target, target, errp);
>> -    if (ret < 0) {
>> +    /* job->len is fixed, so we can't allow resize */
>> +    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,
> 
> Is there a reason you dropped PERM_CONSISTENT_READ here?

Because, we don't use this blk for read now, we read through backup_top child.

> 
>> +                           all_except_resize, speed, creation_flags,
>> +                           cb, opaque, errp);
>> +    if (!job) {
>>           goto error;
>>       }
>>   
>> +    job->source = backup_top->backing;
>> +    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;
> 
> This looks really ugly.  I think as long as the block job performs
> writes itself, it should use its own BlockBackend.

They are not BlockBackends, they are BdrvChildren. It was Kevin's idea to reuse filter's
children in backup job:

https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01017.html

Hmm, and this is also why I need PERM_WRITE in backup_top, to write to target.

> 
> Alternatively, I think it would make sense for the backup-top filter to
> offer functions that this job can use to issue writes to the target.
> Then the backup job would no longer need direct access to the target as
> a BdrvChild.
> 
>> +
>>       job->on_source_error = on_source_error;
>>       job->on_target_error = on_target_error;
>>       job->sync_mode = sync_mode;



-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2019-01-28 11:29     ` Vladimir Sementsov-Ogievskiy
@ 2019-01-28 15:59       ` Max Reitz
  2019-01-28 16:44         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-28 15:59 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 16470 bytes --]

On 28.01.19 12:29, Vladimir Sementsov-Ogievskiy wrote:
> 18.01.2019 17:56, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Drop write notifiers and use filter node instead. Changes:
>>>
>>> 1. copy-before-writes now handled by filter node, so, drop all
>>>     is_write_notifier arguments.
>>>
>>> 2. we don't have intersecting requests, so their handling is dropped.
>>> Instead, synchronization works as follows:
>>> when backup or backup-top starts copying of some area it firstly
>>> clears copy-bitmap bits, and nobody touches areas, not marked with
>>> dirty bits in copy-bitmap, so there is no intersection. Also, backup
>>> job copy operations are surrounded by bdrv region lock, which is
>>> actually serializing request, to not interfer with guest writes and
>>> not read changed data from source (before reading we clear
>>> corresponding bit in copy-bitmap, so, this area is not more handled by
>>> backup-top).
>>>
>>> 3. To sync with in-flight requests we now just drain hook node, we
>>> don't need rw-lock.
>>>
>>> 4. After the whole backup loop (top, full, incremental modes), we need
>>> to check for not copied clusters, which were under backup-top operation
>>> and we skipped them, but backup-top operation failed, error returned to
>>> the guest and dirty bits set back.
>>>
>>> 5. Don't create additional blk, use backup-top children for copy
>>> operations.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
>>>   1 file changed, 149 insertions(+), 136 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 88c0242b4e..e332909fb7 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>
>> [...]
>>
>>> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
>>>   static void backup_clean(Job *job)
>>>   {
>>>       BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>>> -    assert(s->target);
>>> -    blk_unref(s->target);
>>> +
>>> +    /* We must clean it to not crash in backup_drain. */
>>>       s->target = NULL;
>>
>> Why not set s->source to NULL along with it?  It makes sense if you're
>> going to drop the backup-top node because both of these are its children.
> 
> agree.
> 
>>
>>>   
>>>       if (s->copy_bitmap) {
>>>           hbitmap_free(s->copy_bitmap);
>>>           s->copy_bitmap = NULL;
>>>       }
>>> +
>>> +    bdrv_backup_top_drop(s->backup_top);
>>>   }
>>
>> [...]
>>
>>> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>>       bool error_is_read;
>>>       int64_t offset;
>>>       HBitmapIter hbi;
>>> +    void *lock = NULL;
>>>   
>>>       hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
>>> -    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
>>> +    while (hbitmap_count(job->copy_bitmap)) {
>>> +        offset = hbitmap_iter_next(&hbi);
>>> +        if (offset == -1) {
>>> +            /*
>>> +             * we may have skipped some clusters, which were handled by
>>> +             * backup-top, but failed and finished by returning error to
>>> +             * the guest and set dirty bit back.
>>> +             */
>>> +            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
>>> +            offset = hbitmap_iter_next(&hbi);
>>> +            assert(offset);
>>
>> I think you want to assert "offset >= 0".
>>
>>> +        }
>>> +
>>> +        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
>>> +        /*
>>> +         * Dirty bit is set, which means that there are no in-flight
>>> +         * write requests on this area. We must succeed.
>>> +         */
>>> +        assert(lock);
>>
>> I'm not sure that is true right now, but more on that below in backup_run().
>>
>>> +
>>>           do {
>>>               if (yield_and_check(job)) {
>>> +                bdrv_co_unlock(lock);
>>>                   return 0;
>>>               }
>>> -            ret = backup_do_cow(job, offset,
>>> -                                job->cluster_size, &error_is_read, false);
>>> +            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
>>>               if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
>>>                              BLOCK_ERROR_ACTION_REPORT)
>>>               {
>>> +                bdrv_co_unlock(lock);
>>>                   return ret;
>>>               }
>>>           } while (ret < 0);
>>> +
>>> +        bdrv_co_unlock(lock);
>>> +        lock = NULL;
>>
>> This statement seems unnecessary here.
>>
>>>       }
>>>   
>>>       return 0;
>>
>> [...]
>>
>>> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>           hbitmap_set(s->copy_bitmap, 0, s->len);
>>>       }
>>>   
>>> -    s->before_write.notify = backup_before_write_notify;
>>> -    bdrv_add_before_write_notifier(bs, &s->before_write);
>>> -
>>>       if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
>>>           /* All bits are set in copy_bitmap to allow any cluster to be copied.
>>>            * This does not actually require them to be copied. */
>>>           while (!job_is_cancelled(job)) {
>>> -            /* Yield until the job is cancelled.  We just let our before_write
>>> -             * notify callback service CoW requests. */
>>> +            /*
>>> +             * Yield until the job is cancelled.  We just let our backup-top
>>> +             * fileter driver service CbW requests.
>>
>> *filter
>>
>>> +             */
>>>               job_yield(job);
>>>           }
>>>       } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>>>           ret = backup_run_incremental(s);
>>>       } else {
>>
>> [...]
>>
>>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>               if (alloced < 0) {
>>>                   ret = alloced;
>>>               } else {
>>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
>>> +                    trace_backup_do_cow_skip(job, offset);
>>> +                    continue; /* already copied */
>>> +                }
>>> +                if (!lock) {
>>> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
>>> +                    /*
>>> +                     * Dirty bit is set, which means that there are no in-flight
>>> +                     * write requests on this area. We must succeed.
>>> +                     */
>>> +                    assert(lock);
>>
>> What if I have a different parent node for the source that issues
>> concurrent writes?  This used to work fine because the before_write
>> notifier would still work.  After this patch, that would be broken
>> because those writes would not cause a CbW.
> 
> But haw could you have this different parent node? After appending filter,
> there should not be such nodes.

Unless you append them afterwards:

> And I think, during backup it should be
> forbidden to append new parents to source, ignoring filter, as it definitely
> breaks what filter does.

Agreed, but then this needs to be implemented.

> And it applies to other block-job with their filters.
> If we appended a filter, we don't want someone other to write omit our filter.
> 
>>
>> That's not so bad because we just have to make sure that all writes go
>> through the backup-top node.  That would make this assertion valid
>> again, too.  But that means we cannot share PERM_WRITE; see [1].
> 
> But we don't share PERM_WRITE on source in backup_top, only on target.

Are you sure?  The job itself shares it, and the filter shares it, too,
as far as I can see.  It uses bdrv_filter_default_perms(), and that does
seem to share PERM_WRITE.

>>
>>> +                }
>>>                   ret = backup_do_cow(s, offset, s->cluster_size,
>>> -                                    &error_is_read, false);
>>> +                                    &error_is_read);
>>>               }
>>>               if (ret < 0) {
>>>                   /* Depending on error action, fail now or retry cluster */
>>> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>                       break;
>>>                   } else {
>>>                       offset -= s->cluster_size;
>>> +                    retry = true;
>>>                       continue;
>>>                   }
>>>               }
>>>           }
>>> +        if (lock) {
>>> +            bdrv_co_unlock(lock);
>>> +            lock = NULL;
>>> +        }
>>> +        if (ret == 0 && !job_is_cancelled(job) &&
>>> +            hbitmap_count(s->copy_bitmap))
>>> +        {
>>> +            /*
>>> +             * we may have skipped some clusters, which were handled by
>>> +             * backup-top, but failed and finished by returning error to
>>> +             * the guest and set dirty bit back.
>>
>> So it's a matter of a race?
>>
>>> +             */
>>> +            goto iteration;
>>> +        }
>>
>> Why not wrap everything in a do {} while (ret == 0 && !job_is...)
>> instead?  Because it would mean reindenting everything?
> 
> Don't remember, but assume that yes. And this code is anyway "To be refactored",
> I want all FULL/TOP/INCREMENTAL go through the same (mostly) code path.

Hm, well, if you want to refactor it later anyway...  But I don't like
gotos that go backwards very much, unless there is a good reason to have
them (and there isn't here).

>>>       }
>>>   
>>> -    notifier_with_return_remove(&s->before_write);
>>> +    /* wait pending CBW operations in backup-top */
>>> +    bdrv_drain(s->backup_top);
>>>   
>>> -    /* wait until pending backup_do_cow() calls have completed */
>>> -    qemu_co_rwlock_wrlock(&s->flush_rwlock);
>>> -    qemu_co_rwlock_unlock(&s->flush_rwlock);
>>> +    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
>>> +    job_progress_update(job, ret + backup_top_progress -
>>
>> Why the "ret"?
> 
> oops, it looks like a copy-paste bug ("ret" is reasonable in backup_do_cow())
> 
>>
>>> +                        s->backup_top_progress);
>>> +    s->backup_top_progress = backup_top_progress;
>>
>> So the backup-top progress is ignored during basically all of the block
>> job until it suddenly jumps to 100 % completion?  That doesn't sound ideal.
>>
>> Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
>>   (And the while() loop of MODE_NONE)
> 
> 
> It is done in backup_do_cow(), so FULL and TOP are covered.
> 
> But you are right that MODE_NONE seems to have a problem about it.. And just updating it
> in a while loop would not work, as I doubt that job_yield will return until job finish
> or user interaction like pause/continue/cancel..
> 
> So now, it looks better to call job_progress_update() from backup_top directly, and drop
> this hack.

Hmmm...  I don't think job_*() calls belong in backup_top.  How about
adding a callback to bdrv_backup_top_append()?

>>>   
>>>       return ret;
>>>   }
>>> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>       int ret;
>>>       int64_t cluster_size;
>>>       HBitmap *copy_bitmap = NULL;
>>> +    BlockDriverState *backup_top = NULL;
>>> +    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>> +                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;
>>
>> BLK_PERM_ALL & ~BLK_PERM_RESIZE?
>>
>> [1] But we probably do not want to share either PERM_WRITE or
>> PERM_WRITE_UNCHANGED because during the duration of the backup,
>> everything should go through the backup-top filter (not sure about
>> PERM_WRITE_UNCHANGED right now).  Or is that something that the filter
>> node should enforce in backup_top_child_perm()?
> 
> It's not shared perm of backup_top, it's a shared perm of block-job common.blk, which is
> used only to "job->len is fixed, so we can't allow resize", so this part is not changed.
> 
> So yes, the problem you mean by [1] is about backup_top_child_perm() where we share PERM_WRITE.
> And it is described by comment, we must share this write perm, otherwise we break guest writes.

For the target, yes, but the problem is sharing it on the source.

> We share PERM_WRITE in backup_top to force its target child share PERM_WRITE on its backing,
> as backing of target is source.
> 
> But again, we share PERM_WRITE only on target, and it is shared in current code too.

I'm not so sure whether PERM_WRITE is shared only on the target.

>>
>>>   
>>>       assert(bs);
>>>       assert(target);
>>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>   
>>>       copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>>>   
>>> -    /* job->len is fixed, so we can't allow resize */
>>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
>>> -                           BLK_PERM_CONSISTENT_READ,
>>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
>>> -                           speed, creation_flags, cb, opaque, errp);
>>> -    if (!job) {
>>> +    /*
>>> +     * bdrv_get_device_name will not help to find device name starting from
>>> +     * @bs after backup-top append,
>>
>> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
>> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
>> same result)
> 
> bdrv_get_device_name goes finally through role->get_name, and only root role has
> this handler. After append we'll have backing role instead of root.

Ah, I see, I asked the wrong question.

Why is block_job_create() called on bs and not on backup_top?  mirror
calls it on mirror_top_bs.

>>>                                       so let's calculate job_id before. Do
>>> +     * it in the same way like block_job_create
>>> +     */
>>> +    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
>>> +        job_id = bdrv_get_device_name(bs);
>>> +    }
>>> +
>>> +    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
>>> +    if (!backup_top) {
>>>           goto error;
>>>       }
>>>   
>>> -    /* The target must match the source in size, so no resize here either */
>>> -    job->target = blk_new(BLK_PERM_WRITE,
>>> -                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>> -                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
>>> -    ret = blk_insert_bs(job->target, target, errp);
>>> -    if (ret < 0) {
>>> +    /* job->len is fixed, so we can't allow resize */
>>> +    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,
>>
>> Is there a reason you dropped PERM_CONSISTENT_READ here?
> 
> Because, we don't use this blk for read now, we read through backup_top child.

Makes sense.

>>> +                           all_except_resize, speed, creation_flags,
>>> +                           cb, opaque, errp);
>>> +    if (!job) {
>>>           goto error;
>>>       }
>>>   
>>> +    job->source = backup_top->backing;
>>> +    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;
>>
>> This looks really ugly.  I think as long as the block job performs
>> writes itself, it should use its own BlockBackend.
> 
> They are not BlockBackends, they are BdrvChildren.

Exactly, which is what I don't like.  They are children of a node that
is implemented in a different file, it looks weird to use them here.

> It was Kevin's idea to reuse filter's
> children in backup job:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01017.html

It's still ugly if backup_top is in a different file.  Well, maybe just
to me.

> Hmm, and this is also why I need PERM_WRITE in backup_top, to write to target.
> 
>>
>> Alternatively, I think it would make sense for the backup-top filter to
>> offer functions that this job can use to issue writes to the target.
>> Then the backup job would no longer need direct access to the target as
>> a BdrvChild.

So what would be the problem with this?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2019-01-28 15:59       ` Max Reitz
@ 2019-01-28 16:44         ` Vladimir Sementsov-Ogievskiy
  2019-01-28 16:53           ` Max Reitz
  2019-01-28 17:40           ` Kevin Wolf
  0 siblings, 2 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-28 16:44 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

28.01.2019 18:59, Max Reitz wrote:
> On 28.01.19 12:29, Vladimir Sementsov-Ogievskiy wrote:
>> 18.01.2019 17:56, Max Reitz wrote:
>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>>> Drop write notifiers and use filter node instead. Changes:
>>>>
>>>> 1. copy-before-writes now handled by filter node, so, drop all
>>>>      is_write_notifier arguments.
>>>>
>>>> 2. we don't have intersecting requests, so their handling is dropped.
>>>> Instead, synchronization works as follows:
>>>> when backup or backup-top starts copying of some area it firstly
>>>> clears copy-bitmap bits, and nobody touches areas, not marked with
>>>> dirty bits in copy-bitmap, so there is no intersection. Also, backup
>>>> job copy operations are surrounded by bdrv region lock, which is
>>>> actually serializing request, to not interfer with guest writes and
>>>> not read changed data from source (before reading we clear
>>>> corresponding bit in copy-bitmap, so, this area is not more handled by
>>>> backup-top).
>>>>
>>>> 3. To sync with in-flight requests we now just drain hook node, we
>>>> don't need rw-lock.
>>>>
>>>> 4. After the whole backup loop (top, full, incremental modes), we need
>>>> to check for not copied clusters, which were under backup-top operation
>>>> and we skipped them, but backup-top operation failed, error returned to
>>>> the guest and dirty bits set back.
>>>>
>>>> 5. Don't create additional blk, use backup-top children for copy
>>>> operations.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
>>>>    1 file changed, 149 insertions(+), 136 deletions(-)
>>>>
>>>> diff --git a/block/backup.c b/block/backup.c
>>>> index 88c0242b4e..e332909fb7 100644
>>>> --- a/block/backup.c
>>>> +++ b/block/backup.c
>>>
>>> [...]
>>>
>>>> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
>>>>    static void backup_clean(Job *job)
>>>>    {
>>>>        BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
>>>> -    assert(s->target);
>>>> -    blk_unref(s->target);
>>>> +
>>>> +    /* We must clean it to not crash in backup_drain. */
>>>>        s->target = NULL;
>>>
>>> Why not set s->source to NULL along with it?  It makes sense if you're
>>> going to drop the backup-top node because both of these are its children.
>>
>> agree.
>>
>>>
>>>>    
>>>>        if (s->copy_bitmap) {
>>>>            hbitmap_free(s->copy_bitmap);
>>>>            s->copy_bitmap = NULL;
>>>>        }
>>>> +
>>>> +    bdrv_backup_top_drop(s->backup_top);
>>>>    }
>>>
>>> [...]
>>>
>>>> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
>>>>        bool error_is_read;
>>>>        int64_t offset;
>>>>        HBitmapIter hbi;
>>>> +    void *lock = NULL;
>>>>    
>>>>        hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
>>>> -    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
>>>> +    while (hbitmap_count(job->copy_bitmap)) {
>>>> +        offset = hbitmap_iter_next(&hbi);
>>>> +        if (offset == -1) {
>>>> +            /*
>>>> +             * we may have skipped some clusters, which were handled by
>>>> +             * backup-top, but failed and finished by returning error to
>>>> +             * the guest and set dirty bit back.
>>>> +             */
>>>> +            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
>>>> +            offset = hbitmap_iter_next(&hbi);
>>>> +            assert(offset);
>>>
>>> I think you want to assert "offset >= 0".
>>>
>>>> +        }
>>>> +
>>>> +        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
>>>> +        /*
>>>> +         * Dirty bit is set, which means that there are no in-flight
>>>> +         * write requests on this area. We must succeed.
>>>> +         */
>>>> +        assert(lock);
>>>
>>> I'm not sure that is true right now, but more on that below in backup_run().
>>>
>>>> +
>>>>            do {
>>>>                if (yield_and_check(job)) {
>>>> +                bdrv_co_unlock(lock);
>>>>                    return 0;
>>>>                }
>>>> -            ret = backup_do_cow(job, offset,
>>>> -                                job->cluster_size, &error_is_read, false);
>>>> +            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
>>>>                if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
>>>>                               BLOCK_ERROR_ACTION_REPORT)
>>>>                {
>>>> +                bdrv_co_unlock(lock);
>>>>                    return ret;
>>>>                }
>>>>            } while (ret < 0);
>>>> +
>>>> +        bdrv_co_unlock(lock);
>>>> +        lock = NULL;
>>>
>>> This statement seems unnecessary here.
>>>
>>>>        }
>>>>    
>>>>        return 0;
>>>
>>> [...]
>>>
>>>> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>>            hbitmap_set(s->copy_bitmap, 0, s->len);
>>>>        }
>>>>    
>>>> -    s->before_write.notify = backup_before_write_notify;
>>>> -    bdrv_add_before_write_notifier(bs, &s->before_write);
>>>> -
>>>>        if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
>>>>            /* All bits are set in copy_bitmap to allow any cluster to be copied.
>>>>             * This does not actually require them to be copied. */
>>>>            while (!job_is_cancelled(job)) {
>>>> -            /* Yield until the job is cancelled.  We just let our before_write
>>>> -             * notify callback service CoW requests. */
>>>> +            /*
>>>> +             * Yield until the job is cancelled.  We just let our backup-top
>>>> +             * fileter driver service CbW requests.
>>>
>>> *filter
>>>
>>>> +             */
>>>>                job_yield(job);
>>>>            }
>>>>        } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
>>>>            ret = backup_run_incremental(s);
>>>>        } else {
>>>
>>> [...]
>>>
>>>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>>                if (alloced < 0) {
>>>>                    ret = alloced;
>>>>                } else {
>>>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
>>>> +                    trace_backup_do_cow_skip(job, offset);
>>>> +                    continue; /* already copied */
>>>> +                }
>>>> +                if (!lock) {
>>>> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
>>>> +                    /*
>>>> +                     * Dirty bit is set, which means that there are no in-flight
>>>> +                     * write requests on this area. We must succeed.
>>>> +                     */
>>>> +                    assert(lock);
>>>
>>> What if I have a different parent node for the source that issues
>>> concurrent writes?  This used to work fine because the before_write
>>> notifier would still work.  After this patch, that would be broken
>>> because those writes would not cause a CbW.
>>
>> But haw could you have this different parent node? After appending filter,
>> there should not be such nodes.
> 
> Unless you append them afterwards:
> 
>> And I think, during backup it should be
>> forbidden to append new parents to source, ignoring filter, as it definitely
>> breaks what filter does.
> 
> Agreed, but then this needs to be implemented.
> 
>> And it applies to other block-job with their filters.
>> If we appended a filter, we don't want someone other to write omit our filter.
>>
>>>
>>> That's not so bad because we just have to make sure that all writes go
>>> through the backup-top node.  That would make this assertion valid
>>> again, too.  But that means we cannot share PERM_WRITE; see [1].
>>
>> But we don't share PERM_WRITE on source in backup_top, only on target.
> 
> Are you sure?  The job itself shares it, and the filter shares it, too,
> as far as I can see.  It uses bdrv_filter_default_perms(), and that does
> seem to share PERM_WRITE.

And in bdrv_Filter_default_perms it does "*nshared = *nshared | BLK_PERM_WRITE"
only for child_file, it is target. Source is child_backing.

> 
>>>
>>>> +                }
>>>>                    ret = backup_do_cow(s, offset, s->cluster_size,
>>>> -                                    &error_is_read, false);
>>>> +                                    &error_is_read);
>>>>                }
>>>>                if (ret < 0) {
>>>>                    /* Depending on error action, fail now or retry cluster */
>>>> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>>                        break;
>>>>                    } else {
>>>>                        offset -= s->cluster_size;
>>>> +                    retry = true;
>>>>                        continue;
>>>>                    }
>>>>                }
>>>>            }
>>>> +        if (lock) {
>>>> +            bdrv_co_unlock(lock);
>>>> +            lock = NULL;
>>>> +        }
>>>> +        if (ret == 0 && !job_is_cancelled(job) &&
>>>> +            hbitmap_count(s->copy_bitmap))
>>>> +        {
>>>> +            /*
>>>> +             * we may have skipped some clusters, which were handled by
>>>> +             * backup-top, but failed and finished by returning error to
>>>> +             * the guest and set dirty bit back.
>>>
>>> So it's a matter of a race?
>>>
>>>> +             */
>>>> +            goto iteration;
>>>> +        }
>>>
>>> Why not wrap everything in a do {} while (ret == 0 && !job_is...)
>>> instead?  Because it would mean reindenting everything?
>>
>> Don't remember, but assume that yes. And this code is anyway "To be refactored",
>> I want all FULL/TOP/INCREMENTAL go through the same (mostly) code path.
> 
> Hm, well, if you want to refactor it later anyway...  But I don't like
> gotos that go backwards very much, unless there is a good reason to have
> them (and there isn't here).


Ok, not a real problem. Let's go on with do-while.


> 
>>>>        }
>>>>    
>>>> -    notifier_with_return_remove(&s->before_write);
>>>> +    /* wait pending CBW operations in backup-top */
>>>> +    bdrv_drain(s->backup_top);
>>>>    
>>>> -    /* wait until pending backup_do_cow() calls have completed */
>>>> -    qemu_co_rwlock_wrlock(&s->flush_rwlock);
>>>> -    qemu_co_rwlock_unlock(&s->flush_rwlock);
>>>> +    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
>>>> +    job_progress_update(job, ret + backup_top_progress -
>>>
>>> Why the "ret"?
>>
>> oops, it looks like a copy-paste bug ("ret" is reasonable in backup_do_cow())
>>
>>>
>>>> +                        s->backup_top_progress);
>>>> +    s->backup_top_progress = backup_top_progress;
>>>
>>> So the backup-top progress is ignored during basically all of the block
>>> job until it suddenly jumps to 100 % completion?  That doesn't sound ideal.
>>>
>>> Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
>>>    (And the while() loop of MODE_NONE)
>>
>>
>> It is done in backup_do_cow(), so FULL and TOP are covered.
>>
>> But you are right that MODE_NONE seems to have a problem about it.. And just updating it
>> in a while loop would not work, as I doubt that job_yield will return until job finish
>> or user interaction like pause/continue/cancel..
>>
>> So now, it looks better to call job_progress_update() from backup_top directly, and drop
>> this hack.
> 
> Hmmm...  I don't think job_*() calls belong in backup_top.  How about
> adding a callback to bdrv_backup_top_append()?

Ok for me at least as a temporary step.

> 
>>>>    
>>>>        return ret;
>>>>    }
>>>> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>>        int ret;
>>>>        int64_t cluster_size;
>>>>        HBitmap *copy_bitmap = NULL;
>>>> +    BlockDriverState *backup_top = NULL;
>>>> +    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>>> +                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;
>>>
>>> BLK_PERM_ALL & ~BLK_PERM_RESIZE?
>>>
>>> [1] But we probably do not want to share either PERM_WRITE or
>>> PERM_WRITE_UNCHANGED because during the duration of the backup,
>>> everything should go through the backup-top filter (not sure about
>>> PERM_WRITE_UNCHANGED right now).  Or is that something that the filter
>>> node should enforce in backup_top_child_perm()?
>>
>> It's not shared perm of backup_top, it's a shared perm of block-job common.blk, which is
>> used only to "job->len is fixed, so we can't allow resize", so this part is not changed.
>>
>> So yes, the problem you mean by [1] is about backup_top_child_perm() where we share PERM_WRITE.
>> And it is described by comment, we must share this write perm, otherwise we break guest writes.
> 
> For the target, yes, but the problem is sharing it on the source.
> 
>> We share PERM_WRITE in backup_top to force its target child share PERM_WRITE on its backing,
>> as backing of target is source.
>>
>> But again, we share PERM_WRITE only on target, and it is shared in current code too.
> 
> I'm not so sure whether PERM_WRITE is shared only on the target.

Only on target, as child_file is target.

> 
>>>
>>>>    
>>>>        assert(bs);
>>>>        assert(target);
>>>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>>    
>>>>        copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>>>>    
>>>> -    /* job->len is fixed, so we can't allow resize */
>>>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
>>>> -                           BLK_PERM_CONSISTENT_READ,
>>>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>>> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
>>>> -                           speed, creation_flags, cb, opaque, errp);
>>>> -    if (!job) {
>>>> +    /*
>>>> +     * bdrv_get_device_name will not help to find device name starting from
>>>> +     * @bs after backup-top append,
>>>
>>> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
>>> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
>>> same result)
>>
>> bdrv_get_device_name goes finally through role->get_name, and only root role has
>> this handler. After append we'll have backing role instead of root.
> 
> Ah, I see, I asked the wrong question.
> 
> Why is block_job_create() called on bs and not on backup_top?  mirror
> calls it on mirror_top_bs.

Good question. I don't exactly remember, may be there are were more troubles with
permissions or somthing. So, I've to try it again..

What is more beneficial?

My current approach, is that job and filter are two sibling users of source node,
they do copying, they are synchronized. And in this way, it is better to read from
source directly, to not create extra intersection between job and filter..

On the other hand, if we read through the filter, we possible should do the whole
copy operation through the filter..

What is the difference between guest read and backup-job read, in filter POV? I think:

For guest read, filter MUST read (as we must handle guest request), and than, if
we don't have too much in-flight requests, ram-cache is not full, etc, we can handle
already read data in terms of backup, so, copy it somewhere. Or we can drop it, if
we can't handle it at the moment..

For job read, we even MAY not read, if our queues are full, postponing job request.

So

Guest read: MUST read, MAY backup
Job read: MAY read and backup

So, reading through filter has a possibility of common code path + native prioritization
of copy operations. This of course will need more refactoring of backup, and may be done
as a separate step, but definitely, I have to at least try to create job above the filter.

> 
>>>>                                        so let's calculate job_id before. Do
>>>> +     * it in the same way like block_job_create
>>>> +     */
>>>> +    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
>>>> +        job_id = bdrv_get_device_name(bs);
>>>> +    }
>>>> +
>>>> +    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
>>>> +    if (!backup_top) {
>>>>            goto error;
>>>>        }
>>>>    
>>>> -    /* The target must match the source in size, so no resize here either */
>>>> -    job->target = blk_new(BLK_PERM_WRITE,
>>>> -                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>>> -                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
>>>> -    ret = blk_insert_bs(job->target, target, errp);
>>>> -    if (ret < 0) {
>>>> +    /* job->len is fixed, so we can't allow resize */
>>>> +    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,
>>>
>>> Is there a reason you dropped PERM_CONSISTENT_READ here?
>>
>> Because, we don't use this blk for read now, we read through backup_top child.
> 
> Makes sense.
> 
>>>> +                           all_except_resize, speed, creation_flags,
>>>> +                           cb, opaque, errp);
>>>> +    if (!job) {
>>>>            goto error;
>>>>        }
>>>>    
>>>> +    job->source = backup_top->backing;
>>>> +    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;
>>>
>>> This looks really ugly.  I think as long as the block job performs
>>> writes itself, it should use its own BlockBackend.
>>
>> They are not BlockBackends, they are BdrvChildren.
> 
> Exactly, which is what I don't like.  They are children of a node that
> is implemented in a different file, it looks weird to use them here.
> 
>> It was Kevin's idea to reuse filter's
>> children in backup job:
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01017.html
> 
> It's still ugly if backup_top is in a different file.  Well, maybe just
> to me.
> 
>> Hmm, and this is also why I need PERM_WRITE in backup_top, to write to target.
>>
>>>
>>> Alternatively, I think it would make sense for the backup-top filter to
>>> offer functions that this job can use to issue writes to the target.
>>> Then the backup job would no longer need direct access to the target as
>>> a BdrvChild.
> 
> So what would be the problem with this?
> 

I have no specific arguments against, I'll try. Kevin, do have comments on this?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2019-01-28 16:44         ` Vladimir Sementsov-Ogievskiy
@ 2019-01-28 16:53           ` Max Reitz
  2019-01-28 17:14             ` Vladimir Sementsov-Ogievskiy
  2019-01-28 17:40           ` Kevin Wolf
  1 sibling, 1 reply; 54+ messages in thread
From: Max Reitz @ 2019-01-28 16:53 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

[-- Attachment #1: Type: text/plain, Size: 5963 bytes --]

On 28.01.19 17:44, Vladimir Sementsov-Ogievskiy wrote:
> 28.01.2019 18:59, Max Reitz wrote:
>> On 28.01.19 12:29, Vladimir Sementsov-Ogievskiy wrote:
>>> 18.01.2019 17:56, Max Reitz wrote:
>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:

[...]

>>>>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>>>                if (alloced < 0) {
>>>>>                    ret = alloced;
>>>>>                } else {
>>>>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
>>>>> +                    trace_backup_do_cow_skip(job, offset);
>>>>> +                    continue; /* already copied */
>>>>> +                }
>>>>> +                if (!lock) {
>>>>> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
>>>>> +                    /*
>>>>> +                     * Dirty bit is set, which means that there are no in-flight
>>>>> +                     * write requests on this area. We must succeed.
>>>>> +                     */
>>>>> +                    assert(lock);
>>>>
>>>> What if I have a different parent node for the source that issues
>>>> concurrent writes?  This used to work fine because the before_write
>>>> notifier would still work.  After this patch, that would be broken
>>>> because those writes would not cause a CbW.
>>>
>>> But haw could you have this different parent node? After appending filter,
>>> there should not be such nodes.
>>
>> Unless you append them afterwards:
>>
>>> And I think, during backup it should be
>>> forbidden to append new parents to source, ignoring filter, as it definitely
>>> breaks what filter does.
>>
>> Agreed, but then this needs to be implemented.
>>
>>> And it applies to other block-job with their filters.
>>> If we appended a filter, we don't want someone other to write omit our filter.
>>>
>>>>
>>>> That's not so bad because we just have to make sure that all writes go
>>>> through the backup-top node.  That would make this assertion valid
>>>> again, too.  But that means we cannot share PERM_WRITE; see [1].
>>>
>>> But we don't share PERM_WRITE on source in backup_top, only on target.
>>
>> Are you sure?  The job itself shares it, and the filter shares it, too,
>> as far as I can see.  It uses bdrv_filter_default_perms(), and that does
>> seem to share PERM_WRITE.
> 
> And in bdrv_Filter_default_perms it does "*nshared = *nshared | BLK_PERM_WRITE"
> only for child_file, it is target. Source is child_backing.

Hm?  bdrv_filter_default_perms() does this, unconditionally:

>     *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) |
>                (c->shared_perm & DEFAULT_PERM_UNCHANGED);

The backup_top filter does what you describe, but it just leaves
*nshared untouched after bdrv_filter_default_perms() has done the above.

[...]

>>>>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>>>    
>>>>>        copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>>>>>    
>>>>> -    /* job->len is fixed, so we can't allow resize */
>>>>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
>>>>> -                           BLK_PERM_CONSISTENT_READ,
>>>>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>>>> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
>>>>> -                           speed, creation_flags, cb, opaque, errp);
>>>>> -    if (!job) {
>>>>> +    /*
>>>>> +     * bdrv_get_device_name will not help to find device name starting from
>>>>> +     * @bs after backup-top append,
>>>>
>>>> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
>>>> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
>>>> same result)
>>>
>>> bdrv_get_device_name goes finally through role->get_name, and only root role has
>>> this handler. After append we'll have backing role instead of root.
>>
>> Ah, I see, I asked the wrong question.
>>
>> Why is block_job_create() called on bs and not on backup_top?  mirror
>> calls it on mirror_top_bs.
> 
> Good question. I don't exactly remember, may be there are were more troubles with
> permissions or somthing. So, I've to try it again..
> 
> What is more beneficial?
> 
> My current approach, is that job and filter are two sibling users of source node,
> they do copying, they are synchronized. And in this way, it is better to read from
> source directly, to not create extra intersection between job and filter..
> 
> On the other hand, if we read through the filter, we possible should do the whole
> copy operation through the filter..
> 
> What is the difference between guest read and backup-job read, in filter POV? I think:
> 
> For guest read, filter MUST read (as we must handle guest request), and than, if
> we don't have too much in-flight requests, ram-cache is not full, etc, we can handle
> already read data in terms of backup, so, copy it somewhere. Or we can drop it, if
> we can't handle it at the moment..
> 
> For job read, we even MAY not read, if our queues are full, postponing job request.
> 
> So
> 
> Guest read: MUST read, MAY backup
> Job read: MAY read and backup
> 
> So, reading through filter has a possibility of common code path + native prioritization
> of copy operations. This of course will need more refactoring of backup, and may be done
> as a separate step, but definitely, I have to at least try to create job above the filter.

Well, as far as I see it, right now backup_top's read function is just a
passthrough.  I don't see a functional difference between reading from
backup_top and source, but the fact that you could save yourself the
trouble of figuring out the job ID manually.

As for the RAM cache, I thought it was just a target like any other and
backup_top wouldn't need to care at all...?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2019-01-28 16:53           ` Max Reitz
@ 2019-01-28 17:14             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-28 17:14 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

28.01.2019 19:53, Max Reitz wrote:
> On 28.01.19 17:44, Vladimir Sementsov-Ogievskiy wrote:
>> 28.01.2019 18:59, Max Reitz wrote:
>>> On 28.01.19 12:29, Vladimir Sementsov-Ogievskiy wrote:
>>>> 18.01.2019 17:56, Max Reitz wrote:
>>>>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> 
> [...]
> 
>>>>>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>>>>                 if (alloced < 0) {
>>>>>>                     ret = alloced;
>>>>>>                 } else {
>>>>>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
>>>>>> +                    trace_backup_do_cow_skip(job, offset);
>>>>>> +                    continue; /* already copied */
>>>>>> +                }
>>>>>> +                if (!lock) {
>>>>>> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
>>>>>> +                    /*
>>>>>> +                     * Dirty bit is set, which means that there are no in-flight
>>>>>> +                     * write requests on this area. We must succeed.
>>>>>> +                     */
>>>>>> +                    assert(lock);
>>>>>
>>>>> What if I have a different parent node for the source that issues
>>>>> concurrent writes?  This used to work fine because the before_write
>>>>> notifier would still work.  After this patch, that would be broken
>>>>> because those writes would not cause a CbW.
>>>>
>>>> But haw could you have this different parent node? After appending filter,
>>>> there should not be such nodes.
>>>
>>> Unless you append them afterwards:
>>>
>>>> And I think, during backup it should be
>>>> forbidden to append new parents to source, ignoring filter, as it definitely
>>>> breaks what filter does.
>>>
>>> Agreed, but then this needs to be implemented.
>>>
>>>> And it applies to other block-job with their filters.
>>>> If we appended a filter, we don't want someone other to write omit our filter.
>>>>
>>>>>
>>>>> That's not so bad because we just have to make sure that all writes go
>>>>> through the backup-top node.  That would make this assertion valid
>>>>> again, too.  But that means we cannot share PERM_WRITE; see [1].
>>>>
>>>> But we don't share PERM_WRITE on source in backup_top, only on target.
>>>
>>> Are you sure?  The job itself shares it, and the filter shares it, too,
>>> as far as I can see.  It uses bdrv_filter_default_perms(), and that does
>>> seem to share PERM_WRITE.
>>
>> And in bdrv_Filter_default_perms it does "*nshared = *nshared | BLK_PERM_WRITE"
>> only for child_file, it is target. Source is child_backing.
> 
> Hm?  bdrv_filter_default_perms() does this, unconditionally:
> 
>>      *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) |
>>                 (c->shared_perm & DEFAULT_PERM_UNCHANGED);
> 
> The backup_top filter does what you describe, but it just leaves
> *nshared untouched after bdrv_filter_default_perms() has done the above.


Oops, yes, I meant my backup_top_child_perm(), not bdrv_filter_default_perms(), which it calls
too. So, OK, will try to fix it. Sorry for that long extra arguing :(

> 
> [...]
> 
>>>>>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
>>>>>>     
>>>>>>         copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
>>>>>>     
>>>>>> -    /* job->len is fixed, so we can't allow resize */
>>>>>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
>>>>>> -                           BLK_PERM_CONSISTENT_READ,
>>>>>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>>>>>> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
>>>>>> -                           speed, creation_flags, cb, opaque, errp);
>>>>>> -    if (!job) {
>>>>>> +    /*
>>>>>> +     * bdrv_get_device_name will not help to find device name starting from
>>>>>> +     * @bs after backup-top append,
>>>>>
>>>>> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
>>>>> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
>>>>> same result)
>>>>
>>>> bdrv_get_device_name goes finally through role->get_name, and only root role has
>>>> this handler. After append we'll have backing role instead of root.
>>>
>>> Ah, I see, I asked the wrong question.
>>>
>>> Why is block_job_create() called on bs and not on backup_top?  mirror
>>> calls it on mirror_top_bs.
>>
>> Good question. I don't exactly remember, may be there are were more troubles with
>> permissions or somthing. So, I've to try it again..
>>
>> What is more beneficial?
>>
>> My current approach, is that job and filter are two sibling users of source node,
>> they do copying, they are synchronized. And in this way, it is better to read from
>> source directly, to not create extra intersection between job and filter..
>>
>> On the other hand, if we read through the filter, we possible should do the whole
>> copy operation through the filter..
>>
>> What is the difference between guest read and backup-job read, in filter POV? I think:
>>
>> For guest read, filter MUST read (as we must handle guest request), and than, if
>> we don't have too much in-flight requests, ram-cache is not full, etc, we can handle
>> already read data in terms of backup, so, copy it somewhere. Or we can drop it, if
>> we can't handle it at the moment..
>>
>> For job read, we even MAY not read, if our queues are full, postponing job request.
>>
>> So
>>
>> Guest read: MUST read, MAY backup
>> Job read: MAY read and backup
>>
>> So, reading through filter has a possibility of common code path + native prioritization
>> of copy operations. This of course will need more refactoring of backup, and may be done
>> as a separate step, but definitely, I have to at least try to create job above the filter.
> 
> Well, as far as I see it, right now backup_top's read function is just a
> passthrough.  I don't see a functional difference between reading from
> backup_top and source, but the fact that you could save yourself the
> trouble of figuring out the job ID manually.
> 
> As for the RAM cache, I thought it was just a target like any other and
> backup_top wouldn't need to care at all...?

Not sure.. We can have RAM cache, backed by disk cache together with actual backup target
(nbd, for ex.).. And how should they all be finally connected is a question; I think, it
would be simpler to figure it out, when I start implementing it.

I hope that RAM cache should somehow be an alternative to inflight copy requests. RAM cache
is better, as it's data may be clearly reused by guest reads.. Hmm, may be we'll finish with
two jobs, one copying from active disk to RAM cache, and one from RAM cache to target? Or
RAM cache would be inside one backup job?

So again, I'd prefer to return to these questions after backup_top completed, as it's simpler
to discuss something that works.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2019-01-28 16:44         ` Vladimir Sementsov-Ogievskiy
  2019-01-28 16:53           ` Max Reitz
@ 2019-01-28 17:40           ` Kevin Wolf
  2019-01-28 19:00             ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 54+ messages in thread
From: Kevin Wolf @ 2019-01-28 17:40 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Max Reitz, qemu-block, qemu-devel, fam, stefanha, jcody,
	Denis Lunev, eblake, jsnow

Am 28.01.2019 um 17:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 28.01.2019 18:59, Max Reitz wrote:
> > On 28.01.19 12:29, Vladimir Sementsov-Ogievskiy wrote:
> >> 18.01.2019 17:56, Max Reitz wrote:
> >>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> >>>> Drop write notifiers and use filter node instead. Changes:
> >>>>
> >>>> 1. copy-before-writes now handled by filter node, so, drop all
> >>>>      is_write_notifier arguments.
> >>>>
> >>>> 2. we don't have intersecting requests, so their handling is dropped.
> >>>> Instead, synchronization works as follows:
> >>>> when backup or backup-top starts copying of some area it firstly
> >>>> clears copy-bitmap bits, and nobody touches areas, not marked with
> >>>> dirty bits in copy-bitmap, so there is no intersection. Also, backup
> >>>> job copy operations are surrounded by bdrv region lock, which is
> >>>> actually serializing request, to not interfer with guest writes and
> >>>> not read changed data from source (before reading we clear
> >>>> corresponding bit in copy-bitmap, so, this area is not more handled by
> >>>> backup-top).
> >>>>
> >>>> 3. To sync with in-flight requests we now just drain hook node, we
> >>>> don't need rw-lock.
> >>>>
> >>>> 4. After the whole backup loop (top, full, incremental modes), we need
> >>>> to check for not copied clusters, which were under backup-top operation
> >>>> and we skipped them, but backup-top operation failed, error returned to
> >>>> the guest and dirty bits set back.
> >>>>
> >>>> 5. Don't create additional blk, use backup-top children for copy
> >>>> operations.
> >>>>
> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>> ---
> >>>>    block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
> >>>>    1 file changed, 149 insertions(+), 136 deletions(-)
> >>>>
> >>>> diff --git a/block/backup.c b/block/backup.c
> >>>> index 88c0242b4e..e332909fb7 100644
> >>>> --- a/block/backup.c
> >>>> +++ b/block/backup.c
> >>>
> >>> [...]
> >>>
> >>>> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
> >>>>    static void backup_clean(Job *job)
> >>>>    {
> >>>>        BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> >>>> -    assert(s->target);
> >>>> -    blk_unref(s->target);
> >>>> +
> >>>> +    /* We must clean it to not crash in backup_drain. */
> >>>>        s->target = NULL;
> >>>
> >>> Why not set s->source to NULL along with it?  It makes sense if you're
> >>> going to drop the backup-top node because both of these are its children.
> >>
> >> agree.
> >>
> >>>
> >>>>    
> >>>>        if (s->copy_bitmap) {
> >>>>            hbitmap_free(s->copy_bitmap);
> >>>>            s->copy_bitmap = NULL;
> >>>>        }
> >>>> +
> >>>> +    bdrv_backup_top_drop(s->backup_top);
> >>>>    }
> >>>
> >>> [...]
> >>>
> >>>> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
> >>>>        bool error_is_read;
> >>>>        int64_t offset;
> >>>>        HBitmapIter hbi;
> >>>> +    void *lock = NULL;
> >>>>    
> >>>>        hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> >>>> -    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
> >>>> +    while (hbitmap_count(job->copy_bitmap)) {
> >>>> +        offset = hbitmap_iter_next(&hbi);
> >>>> +        if (offset == -1) {
> >>>> +            /*
> >>>> +             * we may have skipped some clusters, which were handled by
> >>>> +             * backup-top, but failed and finished by returning error to
> >>>> +             * the guest and set dirty bit back.
> >>>> +             */
> >>>> +            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> >>>> +            offset = hbitmap_iter_next(&hbi);
> >>>> +            assert(offset);
> >>>
> >>> I think you want to assert "offset >= 0".
> >>>
> >>>> +        }
> >>>> +
> >>>> +        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
> >>>> +        /*
> >>>> +         * Dirty bit is set, which means that there are no in-flight
> >>>> +         * write requests on this area. We must succeed.
> >>>> +         */
> >>>> +        assert(lock);
> >>>
> >>> I'm not sure that is true right now, but more on that below in backup_run().
> >>>
> >>>> +
> >>>>            do {
> >>>>                if (yield_and_check(job)) {
> >>>> +                bdrv_co_unlock(lock);
> >>>>                    return 0;
> >>>>                }
> >>>> -            ret = backup_do_cow(job, offset,
> >>>> -                                job->cluster_size, &error_is_read, false);
> >>>> +            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
> >>>>                if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
> >>>>                               BLOCK_ERROR_ACTION_REPORT)
> >>>>                {
> >>>> +                bdrv_co_unlock(lock);
> >>>>                    return ret;
> >>>>                }
> >>>>            } while (ret < 0);
> >>>> +
> >>>> +        bdrv_co_unlock(lock);
> >>>> +        lock = NULL;
> >>>
> >>> This statement seems unnecessary here.
> >>>
> >>>>        }
> >>>>    
> >>>>        return 0;
> >>>
> >>> [...]
> >>>
> >>>> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> >>>>            hbitmap_set(s->copy_bitmap, 0, s->len);
> >>>>        }
> >>>>    
> >>>> -    s->before_write.notify = backup_before_write_notify;
> >>>> -    bdrv_add_before_write_notifier(bs, &s->before_write);
> >>>> -
> >>>>        if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
> >>>>            /* All bits are set in copy_bitmap to allow any cluster to be copied.
> >>>>             * This does not actually require them to be copied. */
> >>>>            while (!job_is_cancelled(job)) {
> >>>> -            /* Yield until the job is cancelled.  We just let our before_write
> >>>> -             * notify callback service CoW requests. */
> >>>> +            /*
> >>>> +             * Yield until the job is cancelled.  We just let our backup-top
> >>>> +             * fileter driver service CbW requests.
> >>>
> >>> *filter
> >>>
> >>>> +             */
> >>>>                job_yield(job);
> >>>>            }
> >>>>        } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
> >>>>            ret = backup_run_incremental(s);
> >>>>        } else {
> >>>
> >>> [...]
> >>>
> >>>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> >>>>                if (alloced < 0) {
> >>>>                    ret = alloced;
> >>>>                } else {
> >>>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
> >>>> +                    trace_backup_do_cow_skip(job, offset);
> >>>> +                    continue; /* already copied */
> >>>> +                }
> >>>> +                if (!lock) {
> >>>> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
> >>>> +                    /*
> >>>> +                     * Dirty bit is set, which means that there are no in-flight
> >>>> +                     * write requests on this area. We must succeed.
> >>>> +                     */
> >>>> +                    assert(lock);
> >>>
> >>> What if I have a different parent node for the source that issues
> >>> concurrent writes?  This used to work fine because the before_write
> >>> notifier would still work.  After this patch, that would be broken
> >>> because those writes would not cause a CbW.
> >>
> >> But haw could you have this different parent node? After appending filter,
> >> there should not be such nodes.
> > 
> > Unless you append them afterwards:
> > 
> >> And I think, during backup it should be
> >> forbidden to append new parents to source, ignoring filter, as it definitely
> >> breaks what filter does.
> > 
> > Agreed, but then this needs to be implemented.
> > 
> >> And it applies to other block-job with their filters.
> >> If we appended a filter, we don't want someone other to write omit our filter.
> >>
> >>>
> >>> That's not so bad because we just have to make sure that all writes go
> >>> through the backup-top node.  That would make this assertion valid
> >>> again, too.  But that means we cannot share PERM_WRITE; see [1].
> >>
> >> But we don't share PERM_WRITE on source in backup_top, only on target.
> > 
> > Are you sure?  The job itself shares it, and the filter shares it, too,
> > as far as I can see.  It uses bdrv_filter_default_perms(), and that does
> > seem to share PERM_WRITE.
> 
> And in bdrv_Filter_default_perms it does "*nshared = *nshared | BLK_PERM_WRITE"
> only for child_file, it is target. Source is child_backing.
> 
> > 
> >>>
> >>>> +                }
> >>>>                    ret = backup_do_cow(s, offset, s->cluster_size,
> >>>> -                                    &error_is_read, false);
> >>>> +                                    &error_is_read);
> >>>>                }
> >>>>                if (ret < 0) {
> >>>>                    /* Depending on error action, fail now or retry cluster */
> >>>> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> >>>>                        break;
> >>>>                    } else {
> >>>>                        offset -= s->cluster_size;
> >>>> +                    retry = true;
> >>>>                        continue;
> >>>>                    }
> >>>>                }
> >>>>            }
> >>>> +        if (lock) {
> >>>> +            bdrv_co_unlock(lock);
> >>>> +            lock = NULL;
> >>>> +        }
> >>>> +        if (ret == 0 && !job_is_cancelled(job) &&
> >>>> +            hbitmap_count(s->copy_bitmap))
> >>>> +        {
> >>>> +            /*
> >>>> +             * we may have skipped some clusters, which were handled by
> >>>> +             * backup-top, but failed and finished by returning error to
> >>>> +             * the guest and set dirty bit back.
> >>>
> >>> So it's a matter of a race?
> >>>
> >>>> +             */
> >>>> +            goto iteration;
> >>>> +        }
> >>>
> >>> Why not wrap everything in a do {} while (ret == 0 && !job_is...)
> >>> instead?  Because it would mean reindenting everything?
> >>
> >> Don't remember, but assume that yes. And this code is anyway "To be refactored",
> >> I want all FULL/TOP/INCREMENTAL go through the same (mostly) code path.
> > 
> > Hm, well, if you want to refactor it later anyway...  But I don't like
> > gotos that go backwards very much, unless there is a good reason to have
> > them (and there isn't here).
> 
> 
> Ok, not a real problem. Let's go on with do-while.
> 
> 
> > 
> >>>>        }
> >>>>    
> >>>> -    notifier_with_return_remove(&s->before_write);
> >>>> +    /* wait pending CBW operations in backup-top */
> >>>> +    bdrv_drain(s->backup_top);
> >>>>    
> >>>> -    /* wait until pending backup_do_cow() calls have completed */
> >>>> -    qemu_co_rwlock_wrlock(&s->flush_rwlock);
> >>>> -    qemu_co_rwlock_unlock(&s->flush_rwlock);
> >>>> +    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
> >>>> +    job_progress_update(job, ret + backup_top_progress -
> >>>
> >>> Why the "ret"?
> >>
> >> oops, it looks like a copy-paste bug ("ret" is reasonable in backup_do_cow())
> >>
> >>>
> >>>> +                        s->backup_top_progress);
> >>>> +    s->backup_top_progress = backup_top_progress;
> >>>
> >>> So the backup-top progress is ignored during basically all of the block
> >>> job until it suddenly jumps to 100 % completion?  That doesn't sound ideal.
> >>>
> >>> Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
> >>>    (And the while() loop of MODE_NONE)
> >>
> >>
> >> It is done in backup_do_cow(), so FULL and TOP are covered.
> >>
> >> But you are right that MODE_NONE seems to have a problem about it.. And just updating it
> >> in a while loop would not work, as I doubt that job_yield will return until job finish
> >> or user interaction like pause/continue/cancel..
> >>
> >> So now, it looks better to call job_progress_update() from backup_top directly, and drop
> >> this hack.
> > 
> > Hmmm...  I don't think job_*() calls belong in backup_top.  How about
> > adding a callback to bdrv_backup_top_append()?
> 
> Ok for me at least as a temporary step.
> 
> > 
> >>>>    
> >>>>        return ret;
> >>>>    }
> >>>> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> >>>>        int ret;
> >>>>        int64_t cluster_size;
> >>>>        HBitmap *copy_bitmap = NULL;
> >>>> +    BlockDriverState *backup_top = NULL;
> >>>> +    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> >>>> +                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;
> >>>
> >>> BLK_PERM_ALL & ~BLK_PERM_RESIZE?
> >>>
> >>> [1] But we probably do not want to share either PERM_WRITE or
> >>> PERM_WRITE_UNCHANGED because during the duration of the backup,
> >>> everything should go through the backup-top filter (not sure about
> >>> PERM_WRITE_UNCHANGED right now).  Or is that something that the filter
> >>> node should enforce in backup_top_child_perm()?
> >>
> >> It's not shared perm of backup_top, it's a shared perm of block-job common.blk, which is
> >> used only to "job->len is fixed, so we can't allow resize", so this part is not changed.
> >>
> >> So yes, the problem you mean by [1] is about backup_top_child_perm() where we share PERM_WRITE.
> >> And it is described by comment, we must share this write perm, otherwise we break guest writes.
> > 
> > For the target, yes, but the problem is sharing it on the source.
> > 
> >> We share PERM_WRITE in backup_top to force its target child share PERM_WRITE on its backing,
> >> as backing of target is source.
> >>
> >> But again, we share PERM_WRITE only on target, and it is shared in current code too.
> > 
> > I'm not so sure whether PERM_WRITE is shared only on the target.
> 
> Only on target, as child_file is target.
> 
> > 
> >>>
> >>>>    
> >>>>        assert(bs);
> >>>>        assert(target);
> >>>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> >>>>    
> >>>>        copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
> >>>>    
> >>>> -    /* job->len is fixed, so we can't allow resize */
> >>>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
> >>>> -                           BLK_PERM_CONSISTENT_READ,
> >>>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> >>>> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
> >>>> -                           speed, creation_flags, cb, opaque, errp);
> >>>> -    if (!job) {
> >>>> +    /*
> >>>> +     * bdrv_get_device_name will not help to find device name starting from
> >>>> +     * @bs after backup-top append,
> >>>
> >>> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
> >>> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
> >>> same result)
> >>
> >> bdrv_get_device_name goes finally through role->get_name, and only root role has
> >> this handler. After append we'll have backing role instead of root.
> > 
> > Ah, I see, I asked the wrong question.
> > 
> > Why is block_job_create() called on bs and not on backup_top?  mirror
> > calls it on mirror_top_bs.
> 
> Good question. I don't exactly remember, may be there are were more troubles with
> permissions or somthing. So, I've to try it again..
> 
> What is more beneficial?
> 
> My current approach, is that job and filter are two sibling users of source node,
> they do copying, they are synchronized. And in this way, it is better to read from
> source directly, to not create extra intersection between job and filter..
> 
> On the other hand, if we read through the filter, we possible should do the whole
> copy operation through the filter..
> 
> What is the difference between guest read and backup-job read, in filter POV? I think:
> 
> For guest read, filter MUST read (as we must handle guest request), and than, if
> we don't have too much in-flight requests, ram-cache is not full, etc, we can handle
> already read data in terms of backup, so, copy it somewhere. Or we can drop it, if
> we can't handle it at the moment..
> 
> For job read, we even MAY not read, if our queues are full, postponing job request.
> 
> So
> 
> Guest read: MUST read, MAY backup
> Job read: MAY read and backup
> 
> So, reading through filter has a possibility of common code path + native prioritization
> of copy operations. This of course will need more refactoring of backup, and may be done
> as a separate step, but definitely, I have to at least try to create job above the filter.
> 
> > 
> >>>>                                        so let's calculate job_id before. Do
> >>>> +     * it in the same way like block_job_create
> >>>> +     */
> >>>> +    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
> >>>> +        job_id = bdrv_get_device_name(bs);
> >>>> +    }
> >>>> +
> >>>> +    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
> >>>> +    if (!backup_top) {
> >>>>            goto error;
> >>>>        }
> >>>>    
> >>>> -    /* The target must match the source in size, so no resize here either */
> >>>> -    job->target = blk_new(BLK_PERM_WRITE,
> >>>> -                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> >>>> -                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
> >>>> -    ret = blk_insert_bs(job->target, target, errp);
> >>>> -    if (ret < 0) {
> >>>> +    /* job->len is fixed, so we can't allow resize */
> >>>> +    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,
> >>>
> >>> Is there a reason you dropped PERM_CONSISTENT_READ here?
> >>
> >> Because, we don't use this blk for read now, we read through backup_top child.
> > 
> > Makes sense.
> > 
> >>>> +                           all_except_resize, speed, creation_flags,
> >>>> +                           cb, opaque, errp);
> >>>> +    if (!job) {
> >>>>            goto error;
> >>>>        }
> >>>>    
> >>>> +    job->source = backup_top->backing;
> >>>> +    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;
> >>>
> >>> This looks really ugly.  I think as long as the block job performs
> >>> writes itself, it should use its own BlockBackend.
> >>
> >> They are not BlockBackends, they are BdrvChildren.
> > 
> > Exactly, which is what I don't like.  They are children of a node that
> > is implemented in a different file, it looks weird to use them here.
> > 
> >> It was Kevin's idea to reuse filter's
> >> children in backup job:
> >>
> >> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01017.html
> > 
> > It's still ugly if backup_top is in a different file.  Well, maybe just
> > to me.
> > 
> >> Hmm, and this is also why I need PERM_WRITE in backup_top, to write to target.
> >>
> >>>
> >>> Alternatively, I think it would make sense for the backup-top filter to
> >>> offer functions that this job can use to issue writes to the target.
> >>> Then the backup job would no longer need direct access to the target as
> >>> a BdrvChild.
> > 
> > So what would be the problem with this?
> 
> I have no specific arguments against, I'll try. Kevin, do have comments on this?

I haven't really followed this thread recently because there was other
stuff that needed my attention and you seemed to have a good discussion
with Max. If you think my input would be important at this point, I can
try to read up on it tomorrow.

Kevin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers
  2019-01-28 17:40           ` Kevin Wolf
@ 2019-01-28 19:00             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-01-28 19:00 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Max Reitz, qemu-block@nongnu.org	,
	qemu-devel,  fam@euphon.net, stefanha@redhat.com	,
	jcody, Denis Lunev, eblake,  jsnow@redhat.com

28 янв. 2019 г. 20:40 пользователь Kevin Wolf <kwolf@redhat.com> написал:

Am 28.01.2019 um 17:44 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 28.01.2019 18:59, Max Reitz wrote:
> > On 28.01.19 12:29, Vladimir Sementsov-Ogievskiy wrote:
> >> 18.01.2019 17:56, Max Reitz wrote:
> >>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
> >>>> Drop write notifiers and use filter node instead. Changes:
> >>>>
> >>>> 1. copy-before-writes now handled by filter node, so, drop all
> >>>>      is_write_notifier arguments.
> >>>>
> >>>> 2. we don't have intersecting requests, so their handling is dropped.
> >>>> Instead, synchronization works as follows:
> >>>> when backup or backup-top starts copying of some area it firstly
> >>>> clears copy-bitmap bits, and nobody touches areas, not marked with
> >>>> dirty bits in copy-bitmap, so there is no intersection. Also, backup
> >>>> job copy operations are surrounded by bdrv region lock, which is
> >>>> actually serializing request, to not interfer with guest writes and
> >>>> not read changed data from source (before reading we clear
> >>>> corresponding bit in copy-bitmap, so, this area is not more handled by
> >>>> backup-top).
> >>>>
> >>>> 3. To sync with in-flight requests we now just drain hook node, we
> >>>> don't need rw-lock.
> >>>>
> >>>> 4. After the whole backup loop (top, full, incremental modes), we need
> >>>> to check for not copied clusters, which were under backup-top operation
> >>>> and we skipped them, but backup-top operation failed, error returned to
> >>>> the guest and dirty bits set back.
> >>>>
> >>>> 5. Don't create additional blk, use backup-top children for copy
> >>>> operations.
> >>>>
> >>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>> ---
> >>>>    block/backup.c | 285 ++++++++++++++++++++++++++-----------------------
> >>>>    1 file changed, 149 insertions(+), 136 deletions(-)
> >>>>
> >>>> diff --git a/block/backup.c b/block/backup.c
> >>>> index 88c0242b4e..e332909fb7 100644
> >>>> --- a/block/backup.c
> >>>> +++ b/block/backup.c
> >>>
> >>> [...]
> >>>
> >>>> @@ -300,21 +231,23 @@ static void backup_abort(Job *job)
> >>>>    static void backup_clean(Job *job)
> >>>>    {
> >>>>        BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
> >>>> -    assert(s->target);
> >>>> -    blk_unref(s->target);
> >>>> +
> >>>> +    /* We must clean it to not crash in backup_drain. */
> >>>>        s->target = NULL;
> >>>
> >>> Why not set s->source to NULL along with it?  It makes sense if you're
> >>> going to drop the backup-top node because both of these are its children.
> >>
> >> agree.
> >>
> >>>
> >>>>
> >>>>        if (s->copy_bitmap) {
> >>>>            hbitmap_free(s->copy_bitmap);
> >>>>            s->copy_bitmap = NULL;
> >>>>        }
> >>>> +
> >>>> +    bdrv_backup_top_drop(s->backup_top);
> >>>>    }
> >>>
> >>> [...]
> >>>
> >>>> @@ -386,21 +319,45 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
> >>>>        bool error_is_read;
> >>>>        int64_t offset;
> >>>>        HBitmapIter hbi;
> >>>> +    void *lock = NULL;
> >>>>
> >>>>        hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> >>>> -    while ((offset = hbitmap_iter_next(&hbi)) != -1) {
> >>>> +    while (hbitmap_count(job->copy_bitmap)) {
> >>>> +        offset = hbitmap_iter_next(&hbi);
> >>>> +        if (offset == -1) {
> >>>> +            /*
> >>>> +             * we may have skipped some clusters, which were handled by
> >>>> +             * backup-top, but failed and finished by returning error to
> >>>> +             * the guest and set dirty bit back.
> >>>> +             */
> >>>> +            hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
> >>>> +            offset = hbitmap_iter_next(&hbi);
> >>>> +            assert(offset);
> >>>
> >>> I think you want to assert "offset >= 0".
> >>>
> >>>> +        }
> >>>> +
> >>>> +        lock = bdrv_co_try_lock(job->source, offset, job->cluster_size);
> >>>> +        /*
> >>>> +         * Dirty bit is set, which means that there are no in-flight
> >>>> +         * write requests on this area. We must succeed.
> >>>> +         */
> >>>> +        assert(lock);
> >>>
> >>> I'm not sure that is true right now, but more on that below in backup_run().
> >>>
> >>>> +
> >>>>            do {
> >>>>                if (yield_and_check(job)) {
> >>>> +                bdrv_co_unlock(lock);
> >>>>                    return 0;
> >>>>                }
> >>>> -            ret = backup_do_cow(job, offset,
> >>>> -                                job->cluster_size, &error_is_read, false);
> >>>> +            ret = backup_do_cow(job, offset, job->cluster_size, &error_is_read);
> >>>>                if (ret < 0 && backup_error_action(job, error_is_read, -ret) ==
> >>>>                               BLOCK_ERROR_ACTION_REPORT)
> >>>>                {
> >>>> +                bdrv_co_unlock(lock);
> >>>>                    return ret;
> >>>>                }
> >>>>            } while (ret < 0);
> >>>> +
> >>>> +        bdrv_co_unlock(lock);
> >>>> +        lock = NULL;
> >>>
> >>> This statement seems unnecessary here.
> >>>
> >>>>        }
> >>>>
> >>>>        return 0;
> >>>
> >>> [...]
> >>>
> >>>> @@ -447,26 +402,39 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> >>>>            hbitmap_set(s->copy_bitmap, 0, s->len);
> >>>>        }
> >>>>
> >>>> -    s->before_write.notify = backup_before_write_notify;
> >>>> -    bdrv_add_before_write_notifier(bs, &s->before_write);
> >>>> -
> >>>>        if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
> >>>>            /* All bits are set in copy_bitmap to allow any cluster to be copied.
> >>>>             * This does not actually require them to be copied. */
> >>>>            while (!job_is_cancelled(job)) {
> >>>> -            /* Yield until the job is cancelled.  We just let our before_write
> >>>> -             * notify callback service CoW requests. */
> >>>> +            /*
> >>>> +             * Yield until the job is cancelled.  We just let our backup-top
> >>>> +             * fileter driver service CbW requests.
> >>>
> >>> *filter
> >>>
> >>>> +             */
> >>>>                job_yield(job);
> >>>>            }
> >>>>        } else if (s->sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
> >>>>            ret = backup_run_incremental(s);
> >>>>        } else {
> >>>
> >>> [...]
> >>>
> >>>> @@ -505,8 +474,20 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> >>>>                if (alloced < 0) {
> >>>>                    ret = alloced;
> >>>>                } else {
> >>>> +                if (!hbitmap_get(s->copy_bitmap, offset)) {
> >>>> +                    trace_backup_do_cow_skip(job, offset);
> >>>> +                    continue; /* already copied */
> >>>> +                }
> >>>> +                if (!lock) {
> >>>> +                    lock = bdrv_co_try_lock(s->source, offset, s->cluster_size);
> >>>> +                    /*
> >>>> +                     * Dirty bit is set, which means that there are no in-flight
> >>>> +                     * write requests on this area. We must succeed.
> >>>> +                     */
> >>>> +                    assert(lock);
> >>>
> >>> What if I have a different parent node for the source that issues
> >>> concurrent writes?  This used to work fine because the before_write
> >>> notifier would still work.  After this patch, that would be broken
> >>> because those writes would not cause a CbW.
> >>
> >> But haw could you have this different parent node? After appending filter,
> >> there should not be such nodes.
> >
> > Unless you append them afterwards:
> >
> >> And I think, during backup it should be
> >> forbidden to append new parents to source, ignoring filter, as it definitely
> >> breaks what filter does.
> >
> > Agreed, but then this needs to be implemented.
> >
> >> And it applies to other block-job with their filters.
> >> If we appended a filter, we don't want someone other to write omit our filter.
> >>
> >>>
> >>> That's not so bad because we just have to make sure that all writes go
> >>> through the backup-top node.  That would make this assertion valid
> >>> again, too.  But that means we cannot share PERM_WRITE; see [1].
> >>
> >> But we don't share PERM_WRITE on source in backup_top, only on target.
> >
> > Are you sure?  The job itself shares it, and the filter shares it, too,
> > as far as I can see.  It uses bdrv_filter_default_perms(), and that does
> > seem to share PERM_WRITE.
>
> And in bdrv_Filter_default_perms it does "*nshared = *nshared | BLK_PERM_WRITE"
> only for child_file, it is target. Source is child_backing.
>
> >
> >>>
> >>>> +                }
> >>>>                    ret = backup_do_cow(s, offset, s->cluster_size,
> >>>> -                                    &error_is_read, false);
> >>>> +                                    &error_is_read);
> >>>>                }
> >>>>                if (ret < 0) {
> >>>>                    /* Depending on error action, fail now or retry cluster */
> >>>> @@ -516,17 +497,34 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
> >>>>                        break;
> >>>>                    } else {
> >>>>                        offset -= s->cluster_size;
> >>>> +                    retry = true;
> >>>>                        continue;
> >>>>                    }
> >>>>                }
> >>>>            }
> >>>> +        if (lock) {
> >>>> +            bdrv_co_unlock(lock);
> >>>> +            lock = NULL;
> >>>> +        }
> >>>> +        if (ret == 0 && !job_is_cancelled(job) &&
> >>>> +            hbitmap_count(s->copy_bitmap))
> >>>> +        {
> >>>> +            /*
> >>>> +             * we may have skipped some clusters, which were handled by
> >>>> +             * backup-top, but failed and finished by returning error to
> >>>> +             * the guest and set dirty bit back.
> >>>
> >>> So it's a matter of a race?
> >>>
> >>>> +             */
> >>>> +            goto iteration;
> >>>> +        }
> >>>
> >>> Why not wrap everything in a do {} while (ret == 0 && !job_is...)
> >>> instead?  Because it would mean reindenting everything?
> >>
> >> Don't remember, but assume that yes. And this code is anyway "To be refactored",
> >> I want all FULL/TOP/INCREMENTAL go through the same (mostly) code path.
> >
> > Hm, well, if you want to refactor it later anyway...  But I don't like
> > gotos that go backwards very much, unless there is a good reason to have
> > them (and there isn't here).
>
>
> Ok, not a real problem. Let's go on with do-while.
>
>
> >
> >>>>        }
> >>>>
> >>>> -    notifier_with_return_remove(&s->before_write);
> >>>> +    /* wait pending CBW operations in backup-top */
> >>>> +    bdrv_drain(s->backup_top);
> >>>>
> >>>> -    /* wait until pending backup_do_cow() calls have completed */
> >>>> -    qemu_co_rwlock_wrlock(&s->flush_rwlock);
> >>>> -    qemu_co_rwlock_unlock(&s->flush_rwlock);
> >>>> +    backup_top_progress = bdrv_backup_top_progress(s->backup_top);
> >>>> +    job_progress_update(job, ret + backup_top_progress -
> >>>
> >>> Why the "ret"?
> >>
> >> oops, it looks like a copy-paste bug ("ret" is reasonable in backup_do_cow())
> >>
> >>>
> >>>> +                        s->backup_top_progress);
> >>>> +    s->backup_top_progress = backup_top_progress;
> >>>
> >>> So the backup-top progress is ignored during basically all of the block
> >>> job until it suddenly jumps to 100 % completion?  That doesn't sound ideal.
> >>>
> >>> Or did you mean to put this into the for () loop of MODE_TOP/MODE_FULL?
> >>>    (And the while() loop of MODE_NONE)
> >>
> >>
> >> It is done in backup_do_cow(), so FULL and TOP are covered.
> >>
> >> But you are right that MODE_NONE seems to have a problem about it.. And just updating it
> >> in a while loop would not work, as I doubt that job_yield will return until job finish
> >> or user interaction like pause/continue/cancel..
> >>
> >> So now, it looks better to call job_progress_update() from backup_top directly, and drop
> >> this hack.
> >
> > Hmmm...  I don't think job_*() calls belong in backup_top.  How about
> > adding a callback to bdrv_backup_top_append()?
>
> Ok for me at least as a temporary step.
>
> >
> >>>>
> >>>>        return ret;
> >>>>    }
> >>>> @@ -563,6 +561,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> >>>>        int ret;
> >>>>        int64_t cluster_size;
> >>>>        HBitmap *copy_bitmap = NULL;
> >>>> +    BlockDriverState *backup_top = NULL;
> >>>> +    uint64_t all_except_resize = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> >>>> +                                 BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD;
> >>>
> >>> BLK_PERM_ALL & ~BLK_PERM_RESIZE?
> >>>
> >>> [1] But we probably do not want to share either PERM_WRITE or
> >>> PERM_WRITE_UNCHANGED because during the duration of the backup,
> >>> everything should go through the backup-top filter (not sure about
> >>> PERM_WRITE_UNCHANGED right now).  Or is that something that the filter
> >>> node should enforce in backup_top_child_perm()?
> >>
> >> It's not shared perm of backup_top, it's a shared perm of block-job common.blk, which is
> >> used only to "job->len is fixed, so we can't allow resize", so this part is not changed.
> >>
> >> So yes, the problem you mean by [1] is about backup_top_child_perm() where we share PERM_WRITE.
> >> And it is described by comment, we must share this write perm, otherwise we break guest writes.
> >
> > For the target, yes, but the problem is sharing it on the source.
> >
> >> We share PERM_WRITE in backup_top to force its target child share PERM_WRITE on its backing,
> >> as backing of target is source.
> >>
> >> But again, we share PERM_WRITE only on target, and it is shared in current code too.
> >
> > I'm not so sure whether PERM_WRITE is shared only on the target.
>
> Only on target, as child_file is target.
>
> >
> >>>
> >>>>
> >>>>        assert(bs);
> >>>>        assert(target);
> >>>> @@ -655,25 +656,31 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
> >>>>
> >>>>        copy_bitmap = hbitmap_alloc(len, ctz32(cluster_size));
> >>>>
> >>>> -    /* job->len is fixed, so we can't allow resize */
> >>>> -    job = block_job_create(job_id, &backup_job_driver, txn, bs,
> >>>> -                           BLK_PERM_CONSISTENT_READ,
> >>>> -                           BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> >>>> -                           BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD,
> >>>> -                           speed, creation_flags, cb, opaque, errp);
> >>>> -    if (!job) {
> >>>> +    /*
> >>>> +     * bdrv_get_device_name will not help to find device name starting from
> >>>> +     * @bs after backup-top append,
> >>>
> >>> Why not?  Since backup-top is appended, shouldn't all parents of @bs be
> >>> parents of @backup_top then?  (Making bdrv_get_parent_name() return the
> >>> same result)
> >>
> >> bdrv_get_device_name goes finally through role->get_name, and only root role has
> >> this handler. After append we'll have backing role instead of root.
> >
> > Ah, I see, I asked the wrong question.
> >
> > Why is block_job_create() called on bs and not on backup_top?  mirror
> > calls it on mirror_top_bs.
>
> Good question. I don't exactly remember, may be there are were more troubles with
> permissions or somthing. So, I've to try it again..
>
> What is more beneficial?
>
> My current approach, is that job and filter are two sibling users of source node,
> they do copying, they are synchronized. And in this way, it is better to read from
> source directly, to not create extra intersection between job and filter..
>
> On the other hand, if we read through the filter, we possible should do the whole
> copy operation through the filter..
>
> What is the difference between guest read and backup-job read, in filter POV? I think:
>
> For guest read, filter MUST read (as we must handle guest request), and than, if
> we don't have too much in-flight requests, ram-cache is not full, etc, we can handle
> already read data in terms of backup, so, copy it somewhere. Or we can drop it, if
> we can't handle it at the moment..
>
> For job read, we even MAY not read, if our queues are full, postponing job request.
>
> So
>
> Guest read: MUST read, MAY backup
> Job read: MAY read and backup
>
> So, reading through filter has a possibility of common code path + native prioritization
> of copy operations. This of course will need more refactoring of backup, and may be done
> as a separate step, but definitely, I have to at least try to create job above the filter.
>
> >
> >>>>                                        so let's calculate job_id before. Do
> >>>> +     * it in the same way like block_job_create
> >>>> +     */
> >>>> +    if (job_id == NULL && !(creation_flags & JOB_INTERNAL)) {
> >>>> +        job_id = bdrv_get_device_name(bs);
> >>>> +    }
> >>>> +
> >>>> +    backup_top = bdrv_backup_top_append(bs, target, copy_bitmap, errp);
> >>>> +    if (!backup_top) {
> >>>>            goto error;
> >>>>        }
> >>>>
> >>>> -    /* The target must match the source in size, so no resize here either */
> >>>> -    job->target = blk_new(BLK_PERM_WRITE,
> >>>> -                          BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
> >>>> -                          BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
> >>>> -    ret = blk_insert_bs(job->target, target, errp);
> >>>> -    if (ret < 0) {
> >>>> +    /* job->len is fixed, so we can't allow resize */
> >>>> +    job = block_job_create(job_id, &backup_job_driver, txn, bs, 0,
> >>>
> >>> Is there a reason you dropped PERM_CONSISTENT_READ here?
> >>
> >> Because, we don't use this blk for read now, we read through backup_top child.
> >
> > Makes sense.
> >
> >>>> +                           all_except_resize, speed, creation_flags,
> >>>> +                           cb, opaque, errp);
> >>>> +    if (!job) {
> >>>>            goto error;
> >>>>        }
> >>>>
> >>>> +    job->source = backup_top->backing;
> >>>> +    job->target = ((BDRVBackupTopState *)backup_top->opaque)->target;
> >>>
> >>> This looks really ugly.  I think as long as the block job performs
> >>> writes itself, it should use its own BlockBackend.
> >>
> >> They are not BlockBackends, they are BdrvChildren.
> >
> > Exactly, which is what I don't like.  They are children of a node that
> > is implemented in a different file, it looks weird to use them here.
> >
> >> It was Kevin's idea to reuse filter's
> >> children in backup job:
> >>
> >> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01017.html
> >
> > It's still ugly if backup_top is in a different file.  Well, maybe just
> > to me.
> >
> >> Hmm, and this is also why I need PERM_WRITE in backup_top, to write to target.
> >>
> >>>
> >>> Alternatively, I think it would make sense for the backup-top filter to
> >>> offer functions that this job can use to issue writes to the target.
> >>> Then the backup job would no longer need direct access to the target as
> >>> a BdrvChild.
> >
> > So what would be the problem with this?
>
> I have no specific arguments against, I'll try. Kevin, do have comments on this?

I haven't really followed this thread recently because there was other
stuff that needed my attention and you seemed to have a good discussion
with Max. If you think my input would be important at this point, I can
try to read up on it tomorrow.

Kevin

Hmm. Ok, I think I now have enough notes from Max to think and to exirement, so, you can safely skip it for now.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
@ 2019-04-13 16:08       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-13 16:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

16.01.2019 19:02, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Backup-top filter does copy-before-write operation. It should be
>> inserted above active disk and has a target node for CBW, like the
>> following:
>>
>>      +-------+
>>      | Guest |
>>      +---+---+
>>          |r,w
>>          v
>>      +---+-----------+  target   +---------------+
>>      | backup_top    |---------->| target(qcow2) |
>>      +---+-----------+   CBW     +---+-----------+
>>          |
>> backing |r,w
>>          v
>>      +---+---------+
>>      | Active disk |
>>      +-------------+
>>
>> The driver will be used in backup instead of write-notifiers.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/backup-top.h  |  43 +++++++
>>   block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>>   block/Makefile.objs |   2 +
>>   3 files changed, 351 insertions(+)
>>   create mode 100644 block/backup-top.h
>>   create mode 100644 block/backup-top.c
> 

[..]

>> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
>> +                                         BlockDriverState *target,
>> +                                         HBitmap *copy_bitmap,
>> +                                         Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    BDRVBackupTopState *state;
>> +    BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter,
>> +                                                 NULL, BDRV_O_RDWR, errp);
>> +
>> +    if (!top) {
>> +        return NULL;
>> +    }
>> +
>> +    top->implicit = true;
>> +    top->total_sectors = source->total_sectors;
>> +    top->opaque = state = g_new0(BDRVBackupTopState, 1);
>> +    state->copy_bitmap = copy_bitmap;
>> +
>> +    bdrv_ref(target);
>> +    state->target = bdrv_attach_child(top, target, "target", &child_file, errp);
>> +    if (!state->target) {
>> +        bdrv_unref(target);
>> +        bdrv_unref(top);
>> +        return NULL;
>> +    }
>> +
>> +    bdrv_set_aio_context(top, bdrv_get_aio_context(source));
>> +    bdrv_set_aio_context(target, bdrv_get_aio_context(source));
>> +
>> +    bdrv_drained_begin(source);
>> +
>> +    bdrv_ref(top);
>> +    bdrv_append(top, source, &local_err);
>> +
>> +    if (local_err) {
>> +        bdrv_unref(top);
> 
> This is done automatically by bdrv_append().
> 
>> +    }
>> +
>> +    bdrv_drained_end(source);
>> +
>> +    if (local_err) {
>> +        bdrv_unref_child(top, state->target);
>> +        bdrv_unref(top);
>> +        error_propagate(errp, local_err);
>> +        return NULL;
>> +    }
>> +
>> +    return top;
>> +}
>> +
>> +void bdrv_backup_top_drop(BlockDriverState *bs)
>> +{
>> +    BDRVBackupTopState *s = bs->opaque;
>> +
>> +    AioContext *aio_context = bdrv_get_aio_context(bs);
>> +
>> +    aio_context_acquire(aio_context);
>> +
>> +    bdrv_drained_begin(bs);
>> +
>> +    bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, &error_abort);
>> +    bdrv_replace_node(bs, backing_bs(bs), &error_abort);
>> +    bdrv_set_backing_hd(bs, NULL, &error_abort);
> 
> This is done automatically in bdrv_close(), and after bs has been
> replaced by backing_bs(bs), I don't think new requests should come in,
> so I don't think this needs to be done here.

Following movement of backup_top back to job->blk becomes impossible then,
if we don't share WRITE on source in backup_top_child_perm.

And I think, this function should drop all relations created by
bdrv_backup_top_append.

> 
>> +
>> +    bdrv_drained_end(bs);
>> +
>> +    if (s->target) {
>> +        bdrv_unref_child(bs, s->target);
>> +    }
> 
> And this should be done in a .bdrv_close() implementation, I think.
> 
> Max
> 
>> +    bdrv_unref(bs);
>> +
>> +    aio_context_release(aio_context);
>> +}
>> +


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
@ 2019-04-13 16:08       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-13 16:08 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, kwolf, Denis Lunev, jsnow, jcody, stefanha

16.01.2019 19:02, Max Reitz wrote:
> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>> Backup-top filter does copy-before-write operation. It should be
>> inserted above active disk and has a target node for CBW, like the
>> following:
>>
>>      +-------+
>>      | Guest |
>>      +---+---+
>>          |r,w
>>          v
>>      +---+-----------+  target   +---------------+
>>      | backup_top    |---------->| target(qcow2) |
>>      +---+-----------+   CBW     +---+-----------+
>>          |
>> backing |r,w
>>          v
>>      +---+---------+
>>      | Active disk |
>>      +-------------+
>>
>> The driver will be used in backup instead of write-notifiers.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/backup-top.h  |  43 +++++++
>>   block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>>   block/Makefile.objs |   2 +
>>   3 files changed, 351 insertions(+)
>>   create mode 100644 block/backup-top.h
>>   create mode 100644 block/backup-top.c
> 

[..]

>> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
>> +                                         BlockDriverState *target,
>> +                                         HBitmap *copy_bitmap,
>> +                                         Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    BDRVBackupTopState *state;
>> +    BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter,
>> +                                                 NULL, BDRV_O_RDWR, errp);
>> +
>> +    if (!top) {
>> +        return NULL;
>> +    }
>> +
>> +    top->implicit = true;
>> +    top->total_sectors = source->total_sectors;
>> +    top->opaque = state = g_new0(BDRVBackupTopState, 1);
>> +    state->copy_bitmap = copy_bitmap;
>> +
>> +    bdrv_ref(target);
>> +    state->target = bdrv_attach_child(top, target, "target", &child_file, errp);
>> +    if (!state->target) {
>> +        bdrv_unref(target);
>> +        bdrv_unref(top);
>> +        return NULL;
>> +    }
>> +
>> +    bdrv_set_aio_context(top, bdrv_get_aio_context(source));
>> +    bdrv_set_aio_context(target, bdrv_get_aio_context(source));
>> +
>> +    bdrv_drained_begin(source);
>> +
>> +    bdrv_ref(top);
>> +    bdrv_append(top, source, &local_err);
>> +
>> +    if (local_err) {
>> +        bdrv_unref(top);
> 
> This is done automatically by bdrv_append().
> 
>> +    }
>> +
>> +    bdrv_drained_end(source);
>> +
>> +    if (local_err) {
>> +        bdrv_unref_child(top, state->target);
>> +        bdrv_unref(top);
>> +        error_propagate(errp, local_err);
>> +        return NULL;
>> +    }
>> +
>> +    return top;
>> +}
>> +
>> +void bdrv_backup_top_drop(BlockDriverState *bs)
>> +{
>> +    BDRVBackupTopState *s = bs->opaque;
>> +
>> +    AioContext *aio_context = bdrv_get_aio_context(bs);
>> +
>> +    aio_context_acquire(aio_context);
>> +
>> +    bdrv_drained_begin(bs);
>> +
>> +    bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, &error_abort);
>> +    bdrv_replace_node(bs, backing_bs(bs), &error_abort);
>> +    bdrv_set_backing_hd(bs, NULL, &error_abort);
> 
> This is done automatically in bdrv_close(), and after bs has been
> replaced by backing_bs(bs), I don't think new requests should come in,
> so I don't think this needs to be done here.

Following movement of backup_top back to job->blk becomes impossible then,
if we don't share WRITE on source in backup_top_child_perm.

And I think, this function should drop all relations created by
bdrv_backup_top_append.

> 
>> +
>> +    bdrv_drained_end(bs);
>> +
>> +    if (s->target) {
>> +        bdrv_unref_child(bs, s->target);
>> +    }
> 
> And this should be done in a .bdrv_close() implementation, I think.
> 
> Max
> 
>> +    bdrv_unref(bs);
>> +
>> +    aio_context_release(aio_context);
>> +}
>> +


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
@ 2019-04-13 17:03         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-13 17:03 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, stefanha, jcody, kwolf, Denis Lunev, eblake, jsnow

13.04.2019 19:08, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 19:02, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Backup-top filter does copy-before-write operation. It should be
>>> inserted above active disk and has a target node for CBW, like the
>>> following:
>>>
>>>      +-------+
>>>      | Guest |
>>>      +---+---+
>>>          |r,w
>>>          v
>>>      +---+-----------+  target   +---------------+
>>>      | backup_top    |---------->| target(qcow2) |
>>>      +---+-----------+   CBW     +---+-----------+
>>>          |
>>> backing |r,w
>>>          v
>>>      +---+---------+
>>>      | Active disk |
>>>      +-------------+
>>>
>>> The driver will be used in backup instead of write-notifiers.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/backup-top.h  |  43 +++++++
>>>   block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>>>   block/Makefile.objs |   2 +
>>>   3 files changed, 351 insertions(+)
>>>   create mode 100644 block/backup-top.h
>>>   create mode 100644 block/backup-top.c
>>
> 
> [..]
> 
>>> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
>>> +                                         BlockDriverState *target,
>>> +                                         HBitmap *copy_bitmap,
>>> +                                         Error **errp)
>>> +{
>>> +    Error *local_err = NULL;
>>> +    BDRVBackupTopState *state;
>>> +    BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter,
>>> +                                                 NULL, BDRV_O_RDWR, errp);
>>> +
>>> +    if (!top) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    top->implicit = true;
>>> +    top->total_sectors = source->total_sectors;
>>> +    top->opaque = state = g_new0(BDRVBackupTopState, 1);
>>> +    state->copy_bitmap = copy_bitmap;
>>> +
>>> +    bdrv_ref(target);
>>> +    state->target = bdrv_attach_child(top, target, "target", &child_file, errp);
>>> +    if (!state->target) {
>>> +        bdrv_unref(target);
>>> +        bdrv_unref(top);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    bdrv_set_aio_context(top, bdrv_get_aio_context(source));
>>> +    bdrv_set_aio_context(target, bdrv_get_aio_context(source));
>>> +
>>> +    bdrv_drained_begin(source);
>>> +
>>> +    bdrv_ref(top);
>>> +    bdrv_append(top, source, &local_err);
>>> +
>>> +    if (local_err) {
>>> +        bdrv_unref(top);
>>
>> This is done automatically by bdrv_append().
>>
>>> +    }
>>> +
>>> +    bdrv_drained_end(source);
>>> +
>>> +    if (local_err) {
>>> +        bdrv_unref_child(top, state->target);
>>> +        bdrv_unref(top);
>>> +        error_propagate(errp, local_err);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    return top;
>>> +}
>>> +
>>> +void bdrv_backup_top_drop(BlockDriverState *bs)
>>> +{
>>> +    BDRVBackupTopState *s = bs->opaque;
>>> +
>>> +    AioContext *aio_context = bdrv_get_aio_context(bs);
>>> +
>>> +    aio_context_acquire(aio_context);
>>> +
>>> +    bdrv_drained_begin(bs);
>>> +
>>> +    bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, &error_abort);
>>> +    bdrv_replace_node(bs, backing_bs(bs), &error_abort);
>>> +    bdrv_set_backing_hd(bs, NULL, &error_abort);
>>
>> This is done automatically in bdrv_close(), and after bs has been
>> replaced by backing_bs(bs), I don't think new requests should come in,
>> so I don't think this needs to be done here.
> 
> Following movement of backup_top back to job->blk becomes impossible then,
> if we don't share WRITE on source in backup_top_child_perm.
> 
> And I think, this function should drop all relations created by
> bdrv_backup_top_append.
> 
>>
>>> +
>>> +    bdrv_drained_end(bs);
>>> +
>>> +    if (s->target) {
>>> +        bdrv_unref_child(bs, s->target);
>>> +    }
>>
>> And this should be done in a .bdrv_close() implementation, I think.
>>

and therefore this one too. We don't have .bdrv_open, so I'd prefer not
have bdrv_close. We create this child in _top_append, seems logical to
unref it in _top_drop.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver
@ 2019-04-13 17:03         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 54+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2019-04-13 17:03 UTC (permalink / raw)
  To: Max Reitz, qemu-block, qemu-devel
  Cc: fam, kwolf, Denis Lunev, jsnow, jcody, stefanha

13.04.2019 19:08, Vladimir Sementsov-Ogievskiy wrote:
> 16.01.2019 19:02, Max Reitz wrote:
>> On 29.12.18 13:20, Vladimir Sementsov-Ogievskiy wrote:
>>> Backup-top filter does copy-before-write operation. It should be
>>> inserted above active disk and has a target node for CBW, like the
>>> following:
>>>
>>>      +-------+
>>>      | Guest |
>>>      +---+---+
>>>          |r,w
>>>          v
>>>      +---+-----------+  target   +---------------+
>>>      | backup_top    |---------->| target(qcow2) |
>>>      +---+-----------+   CBW     +---+-----------+
>>>          |
>>> backing |r,w
>>>          v
>>>      +---+---------+
>>>      | Active disk |
>>>      +-------------+
>>>
>>> The driver will be used in backup instead of write-notifiers.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/backup-top.h  |  43 +++++++
>>>   block/backup-top.c  | 306 ++++++++++++++++++++++++++++++++++++++++++++
>>>   block/Makefile.objs |   2 +
>>>   3 files changed, 351 insertions(+)
>>>   create mode 100644 block/backup-top.h
>>>   create mode 100644 block/backup-top.c
>>
> 
> [..]
> 
>>> +BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
>>> +                                         BlockDriverState *target,
>>> +                                         HBitmap *copy_bitmap,
>>> +                                         Error **errp)
>>> +{
>>> +    Error *local_err = NULL;
>>> +    BDRVBackupTopState *state;
>>> +    BlockDriverState *top = bdrv_new_open_driver(&bdrv_backup_top_filter,
>>> +                                                 NULL, BDRV_O_RDWR, errp);
>>> +
>>> +    if (!top) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    top->implicit = true;
>>> +    top->total_sectors = source->total_sectors;
>>> +    top->opaque = state = g_new0(BDRVBackupTopState, 1);
>>> +    state->copy_bitmap = copy_bitmap;
>>> +
>>> +    bdrv_ref(target);
>>> +    state->target = bdrv_attach_child(top, target, "target", &child_file, errp);
>>> +    if (!state->target) {
>>> +        bdrv_unref(target);
>>> +        bdrv_unref(top);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    bdrv_set_aio_context(top, bdrv_get_aio_context(source));
>>> +    bdrv_set_aio_context(target, bdrv_get_aio_context(source));
>>> +
>>> +    bdrv_drained_begin(source);
>>> +
>>> +    bdrv_ref(top);
>>> +    bdrv_append(top, source, &local_err);
>>> +
>>> +    if (local_err) {
>>> +        bdrv_unref(top);
>>
>> This is done automatically by bdrv_append().
>>
>>> +    }
>>> +
>>> +    bdrv_drained_end(source);
>>> +
>>> +    if (local_err) {
>>> +        bdrv_unref_child(top, state->target);
>>> +        bdrv_unref(top);
>>> +        error_propagate(errp, local_err);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    return top;
>>> +}
>>> +
>>> +void bdrv_backup_top_drop(BlockDriverState *bs)
>>> +{
>>> +    BDRVBackupTopState *s = bs->opaque;
>>> +
>>> +    AioContext *aio_context = bdrv_get_aio_context(bs);
>>> +
>>> +    aio_context_acquire(aio_context);
>>> +
>>> +    bdrv_drained_begin(bs);
>>> +
>>> +    bdrv_child_try_set_perm(bs->backing, 0, BLK_PERM_ALL, &error_abort);
>>> +    bdrv_replace_node(bs, backing_bs(bs), &error_abort);
>>> +    bdrv_set_backing_hd(bs, NULL, &error_abort);
>>
>> This is done automatically in bdrv_close(), and after bs has been
>> replaced by backing_bs(bs), I don't think new requests should come in,
>> so I don't think this needs to be done here.
> 
> Following movement of backup_top back to job->blk becomes impossible then,
> if we don't share WRITE on source in backup_top_child_perm.
> 
> And I think, this function should drop all relations created by
> bdrv_backup_top_append.
> 
>>
>>> +
>>> +    bdrv_drained_end(bs);
>>> +
>>> +    if (s->target) {
>>> +        bdrv_unref_child(bs, s->target);
>>> +    }
>>
>> And this should be done in a .bdrv_close() implementation, I think.
>>

and therefore this one too. We don't have .bdrv_open, so I'd prefer not
have bdrv_close. We create this child in _top_append, seems logical to
unref it in _top_drop.


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2019-04-13 17:04 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-29 12:20 [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 01/11] block/backup: simplify backup_incremental_init_copy_bitmap Vladimir Sementsov-Ogievskiy
2019-01-14 13:10   ` Max Reitz
2019-01-14 13:13     ` Max Reitz
2019-01-14 14:01     ` Vladimir Sementsov-Ogievskiy
2019-01-14 14:13       ` Max Reitz
2019-01-14 14:48         ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:05           ` Max Reitz
2019-01-23  8:20             ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:19               ` Max Reitz
2019-01-23 14:36               ` Eric Blake
2019-01-24 14:20                 ` Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 02/11] block/backup: move to copy_bitmap with granularity Vladimir Sementsov-Ogievskiy
2019-01-14 14:10   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 03/11] block: improve should_update_child Vladimir Sementsov-Ogievskiy
2019-01-14 14:32   ` Max Reitz
2019-01-14 16:13     ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:17       ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 04/11] iotests: handle -f argument correctly for qemu_io_silent Vladimir Sementsov-Ogievskiy
2019-01-14 14:36   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 05/11] iotests: allow resume_drive by node name Vladimir Sementsov-Ogievskiy
2019-01-14 14:46   ` Max Reitz
2019-01-14 16:06     ` Vladimir Sementsov-Ogievskiy
2019-01-16 13:11       ` Max Reitz
2019-01-23 13:22         ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:31           ` Vladimir Sementsov-Ogievskiy
2019-01-23 13:33             ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 06/11] iotests: prepare 055 to graph changes during backup job Vladimir Sementsov-Ogievskiy
2019-01-16 13:48   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 07/11] block: introduce backup-top filter driver Vladimir Sementsov-Ogievskiy
2019-01-16 16:02   ` Max Reitz
2019-01-17 12:13     ` Vladimir Sementsov-Ogievskiy
2019-01-18 12:05       ` Max Reitz
2019-01-23 13:47         ` Vladimir Sementsov-Ogievskiy
2019-04-13 16:08     ` Vladimir Sementsov-Ogievskiy
2019-04-13 16:08       ` Vladimir Sementsov-Ogievskiy
2019-04-13 17:03       ` Vladimir Sementsov-Ogievskiy
2019-04-13 17:03         ` Vladimir Sementsov-Ogievskiy
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 08/11] block/io: refactor wait_serialising_requests Vladimir Sementsov-Ogievskiy
2019-01-16 16:18   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 09/11] block: add lock/unlock range functions Vladimir Sementsov-Ogievskiy
2019-01-16 16:36   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 10/11] block/backup: tiny refactor backup_job_create Vladimir Sementsov-Ogievskiy
2019-01-18 13:00   ` Max Reitz
2018-12-29 12:20 ` [Qemu-devel] [PATCH v5 11/11] block/backup: use backup-top instead of write notifiers Vladimir Sementsov-Ogievskiy
2019-01-18 14:56   ` Max Reitz
2019-01-28 11:29     ` Vladimir Sementsov-Ogievskiy
2019-01-28 15:59       ` Max Reitz
2019-01-28 16:44         ` Vladimir Sementsov-Ogievskiy
2019-01-28 16:53           ` Max Reitz
2019-01-28 17:14             ` Vladimir Sementsov-Ogievskiy
2019-01-28 17:40           ` Kevin Wolf
2019-01-28 19:00             ` Vladimir Sementsov-Ogievskiy
2019-01-23 15:26 ` [Qemu-devel] [PATCH v5 00/11] backup-top filter driver for backup no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.