All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring
@ 2018-01-22 22:07 Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 01/16] block: BDS deletion during bdrv_drain_recurse Max Reitz
                   ` (16 more replies)
  0 siblings, 17 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This series implements an active and synchronous mirroring mode.

Currently, the mirror block job is passive an asynchronous: Depending on
your start conditions, some part of the source disk starts as "dirty".
Then, the block job will (as a background operation) continuously copy
dirty parts to the target disk until all of the source disk is clean.
In the meantime, any write to the source disk dirties the affected area.

One effect of this operational mode is that the job may never converge:
If the writes to the source happen faster than the block job copies data
to the target, the job can never finish.

When the active mode implemented in this series is enabled, every write
request to the source will automatically trigger a synchronous write to
the target right afterwards.  Therefore, the source can never get dirty
faster than data is copied to the target.  Most importantly, once source
and target are in sync (BLOCK_JOB_READY is emitted), they will not
diverge (unless e.g. an I/O error occurs).

Active mirroring also improves on a second issue of the passive mode: We
do not have to read data from the source in order to write it to the
target.  When new data is written to the source in active mode, it is
automatically mirrored to the target, which saves us the superfluous
read from the source.


Things to do on top of this series:
- Allow switching between active and passive mode at runtime: Mainly
  hinges on the question of how to expose it to the user (ideally
  through a generic block-job-set-option command)

- Implement an asynchronous active mode (launch both write operations to
  the source and the target at the same time, and do not wait for the
  target operation to finish)

- Integrate the mirror BDS more tightly into the BDS graph:  Both source
  and target should be BdrvChildren (and the source should not be the
  "backing" child).  I'm working on this in a follow-up.

- Improve the mirror job coroutine use: Currently more of a hack, a
  follow-up will make this nicer.

- Add read-write-blocking mode: This series adds the write-blocking
  mode, where every write blocks until the data has been mirrored to the
  target.  read-write-blocking would also mirror data on reads from the
  source, which saves some performance (because that data does not have
  to be read twice) at the cost of latency on mirroring read operations.
  (Will be in the same follow-up.)


v2:
- Dropped some work, moving it to a follow-up series (making source the
  file child, adding the read-write-blocking mode)
- Patches 1 and 2: Dropped BdrvDeletedStatus, instead replacing it by
  bdrv_ref()/bdrv_unref() pairs (with some precautions so we don't get
  into an infinite recursion)
  - And added a test for this, patch 3
- Patch 4: Just rebase conflicts
- Patch 5: Reworked the use of bytes_copied/now "bytes_handled", fixed
      the mirror_co_read() comment -- everything will be made nicer
      in a follow-up, I promise!
- Patch 7: mirror_wait_for_free_in_flight_slot() may not wait put its
      own coroutine into a pseudo operation's waiting queue, or we may
      starve.  (I noticed this when running the iotest added by this
      series under qcow v1, so the test is there!)
- Patch 8: Drop the MirrorBlockJob.source field
- Patch 10: Fixed the function description, and there was a rebase
      conflict (due to an additional user of hbitmap_iter_next()).
- Patch 11: Block bitmaps use bytes instead of sectors now, so this
      patch gets a bit easier
- Patch 12: Rebase conflict because of the change to patch 7
- Patch 14:
  - Dropped read-write-block mode (or rather, put it off to a follow-up)
  - Renamed "passive" -> "background" and
    "active-write" -> "write-blocking"
  - Some rebase conflicts
  - Reset dirty bitmap before the active write, re-set it on failure
    (instead of only clearing it on success); important to prevent
    double-writes (from background and active operations)
  - Use a bounce buffer for active operations
- Patch 15:
  - Renamed "passive" -> "background" and
    "active-write" -> "write-blocking"
  - Aimed for 2.12 now...
- Patch 16: Use BlockBackends and aio_write instead of an own write -B


git-backport-diff to v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/16:[0054] [FC] 'block: BDS deletion during bdrv_drain_recurse'
002/16:[down] 'block: BDS deletion in bdrv_do_drained_begin()'
003/16:[down] 'tests: Add bdrv-drain test for node deletion'
004/16:[0006] [FC] 'block/mirror: Pull out mirror_perform()'
005/16:[0050] [FC] 'block/mirror: Convert to coroutines'
006/16:[----] [-C] 'block/mirror: Use CoQueue to wait on in-flight ops'
007/16:[0021] [FC] 'block/mirror: Wait for in-flight op conflicts'
008/16:[0016] [FC] 'block/mirror: Use source as a BdrvChild'
009/16:[----] [--] 'block: Generalize should_update_child() rule'
010/16:[0005] [FC] 'hbitmap: Add @advance param to hbitmap_iter_next()'
011/16:[0011] [FC] 'block/dirty-bitmap: Add bdrv_dirty_iter_next_area'
012/16:[0017] [FC] 'block/mirror: Distinguish active from passive ops'
013/16:[down] 'block/mirror: Add MirrorBDSOpaque'
014/16:[0112] [FC] 'block/mirror: Add active mirroring'
015/16:[0011] [FC] 'block/mirror: Add copy mode QAPI interface'
016/16:[0023] [FC] 'iotests: Add test for active mirroring'


Max Reitz (16):
  block: BDS deletion during bdrv_drain_recurse
  block: BDS deletion in bdrv_do_drained_begin()
  tests: Add bdrv-drain test for node deletion
  block/mirror: Pull out mirror_perform()
  block/mirror: Convert to coroutines
  block/mirror: Use CoQueue to wait on in-flight ops
  block/mirror: Wait for in-flight op conflicts
  block/mirror: Use source as a BdrvChild
  block: Generalize should_update_child() rule
  hbitmap: Add @advance param to hbitmap_iter_next()
  block/dirty-bitmap: Add bdrv_dirty_iter_next_area
  block/mirror: Distinguish active from passive ops
  block/mirror: Add MirrorBDSOpaque
  block/mirror: Add active mirroring
  block/mirror: Add copy mode QAPI interface
  iotests: Add test for active mirroring

 qapi/block-core.json         |  29 ++-
 include/block/block_int.h    |   6 +-
 include/block/dirty-bitmap.h |   2 +
 include/qemu/hbitmap.h       |   5 +-
 block.c                      |  44 +++-
 block/backup.c               |   2 +-
 block/dirty-bitmap.c         |  53 ++++-
 block/io.c                   |  59 ++++-
 block/mirror.c               | 540 +++++++++++++++++++++++++++++++++----------
 blockdev.c                   |   9 +-
 tests/test-bdrv-drain.c      | 165 +++++++++++++
 tests/test-hbitmap.c         |  26 +--
 util/hbitmap.c               |  10 +-
 tests/qemu-iotests/151       | 114 +++++++++
 tests/qemu-iotests/151.out   |   5 +
 tests/qemu-iotests/group     |   1 +
 16 files changed, 910 insertions(+), 160 deletions(-)
 create mode 100755 tests/qemu-iotests/151
 create mode 100644 tests/qemu-iotests/151.out

-- 
2.14.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 01/16] block: BDS deletion during bdrv_drain_recurse
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 02/16] block: BDS deletion in bdrv_do_drained_begin() Max Reitz
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

Draining a BDS child may lead to other children of the same parent being
detached and/or deleted.  We should prepare for the former case (by
copying the children list before iterating through it) and prevent the
latter (by bdrv_ref()'ing all nodes if we are in the main loop).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/block/io.c b/block/io.c
index 7ea402352e..ca7dfecfc9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -189,31 +189,51 @@ static void bdrv_drain_invoke(BlockDriverState *bs, bool begin, bool recursive)
 
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
-    BdrvChild *child, *tmp;
+    BdrvChild *child;
     bool waited;
+    struct BDSToDrain {
+        BlockDriverState *bs;
+        QLIST_ENTRY(BDSToDrain) next;
+    };
+    QLIST_HEAD(, BDSToDrain) bs_list = QLIST_HEAD_INITIALIZER(bs_list);
+    bool in_main_loop =
+        qemu_get_current_aio_context() == qemu_get_aio_context();
 
     /* Wait for drained requests to finish */
     waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
 
-    QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
-        BlockDriverState *bs = child->bs;
-        bool in_main_loop =
-            qemu_get_current_aio_context() == qemu_get_aio_context();
-        assert(bs->refcnt > 0);
+    /* Draining children may result in other children being removed from this
+     * parent and maybe even deleted, so copy the children list first */
+    QLIST_FOREACH(child, &bs->children, next) {
+        struct BDSToDrain *bs2d = g_new0(struct BDSToDrain, 1);
+
+        bs2d->bs = child->bs;
         if (in_main_loop) {
             /* In case the recursive bdrv_drain_recurse processes a
              * block_job_defer_to_main_loop BH and modifies the graph,
-             * let's hold a reference to bs until we are done.
+             * let's hold a reference to the BDS until we are done.
              *
              * IOThread doesn't have such a BH, and it is not safe to call
              * bdrv_unref without BQL, so skip doing it there.
              */
-            bdrv_ref(bs);
+            bdrv_ref(bs2d->bs);
         }
-        waited |= bdrv_drain_recurse(bs);
+
+        QLIST_INSERT_HEAD(&bs_list, bs2d, next);
+    }
+
+    while (!QLIST_EMPTY(&bs_list)) {
+        struct BDSToDrain *bs2d = QLIST_FIRST(&bs_list);
+        QLIST_REMOVE(bs2d, next);
+
+        assert(bs2d->bs->refcnt > 0);
+
+        waited |= bdrv_drain_recurse(bs2d->bs);
         if (in_main_loop) {
-            bdrv_unref(bs);
+            bdrv_unref(bs2d->bs);
         }
+
+        g_free(bs2d);
     }
 
     return waited;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 02/16] block: BDS deletion in bdrv_do_drained_begin()
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 01/16] block: BDS deletion during bdrv_drain_recurse Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 03/16] tests: Add bdrv-drain test for node deletion Max Reitz
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

Draining a BDS (in the main loop) may cause it to go be deleted.  That
is rather suboptimal if we still plan to access it afterwards, so let us
enclose the main body of the function with a bdrv_ref()/bdrv_unref()
pair.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/io.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/block/io.c b/block/io.c
index ca7dfecfc9..1ff2ff0adb 100644
--- a/block/io.c
+++ b/block/io.c
@@ -294,12 +294,27 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
                            BdrvChild *parent)
 {
     BdrvChild *child, *next;
+    bool in_main_loop =
+        qemu_get_current_aio_context() == qemu_get_aio_context();
+    /* bdrv_close() invokes bdrv_drain() with bs->refcnt == 0; then,
+     * we may not invoke bdrv_ref()/bdrv_unref() because the latter
+     * would result in the refcount going back to 0, creating an
+     * infinite loop.
+     * Also, we have to be in the main loop because we may not call
+     * bdrv_unref() elsewhere.  But because of that, the BDS is not in
+     * danger of going away without the bdrv_ref()/bdrv_unref() pair
+     * elsewhere, so we are fine then. */
+    bool add_ref = in_main_loop && bs->refcnt > 0;
 
     if (qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, true, recursive, parent);
         return;
     }
 
+    if (add_ref) {
+        bdrv_ref(bs);
+    }
+
     /* Stop things in parent-to-child order */
     if (atomic_fetch_inc(&bs->quiesce_counter) == 0) {
         aio_disable_external(bdrv_get_aio_context(bs));
@@ -315,6 +330,10 @@ void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
             bdrv_do_drained_begin(child->bs, true, child);
         }
     }
+
+    if (add_ref) {
+        bdrv_unref(bs);
+    }
 }
 
 void bdrv_drained_begin(BlockDriverState *bs)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 03/16] tests: Add bdrv-drain test for node deletion
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 01/16] block: BDS deletion during bdrv_drain_recurse Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 02/16] block: BDS deletion in bdrv_do_drained_begin() Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 04/16] block/mirror: Pull out mirror_perform() Max Reitz
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This patch adds two bdrv-drain tests for what happens if some BDS goes
away during the drainage.

The basic idea is that you have a parent BDS with some child nodes.
Then, you drain one of the children.  Because of that, the party who
actually owns the parent decides to (A) delete it, or (B) detach all its
children from it -- both while the child is still being drained.

A real-world case where this can happen is the mirror block job, which
may exit if you drain one of its children.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/test-bdrv-drain.c | 165 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 165 insertions(+)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index d760e2b243..c8c9178f25 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -610,6 +610,169 @@ static void test_blockjob_drain_subtree(void)
     test_blockjob_common(BDRV_SUBTREE_DRAIN);
 }
 
+
+typedef struct BDRVTestTopState {
+    BdrvChild *wait_child;
+} BDRVTestTopState;
+
+static void bdrv_test_top_close(BlockDriverState *bs)
+{
+    BdrvChild *c, *next_c;
+    QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
+        bdrv_unref_child(bs, c);
+    }
+}
+
+static int coroutine_fn bdrv_test_top_co_preadv(BlockDriverState *bs,
+                                                uint64_t offset, uint64_t bytes,
+                                                QEMUIOVector *qiov, int flags)
+{
+    BDRVTestTopState *tts = bs->opaque;
+    return bdrv_co_preadv(tts->wait_child, offset, bytes, qiov, flags);
+}
+
+static BlockDriver bdrv_test_top_driver = {
+    .format_name            = "test_top_driver",
+    .instance_size          = sizeof(BDRVTestTopState),
+
+    .bdrv_close             = bdrv_test_top_close,
+    .bdrv_co_preadv         = bdrv_test_top_co_preadv,
+
+    .bdrv_child_perm        = bdrv_format_default_perms,
+};
+
+typedef struct TestCoDeleteByDrainData {
+    BlockBackend *blk;
+    bool detach_instead_of_delete;
+    bool done;
+} TestCoDeleteByDrainData;
+
+static void coroutine_fn test_co_delete_by_drain(void *opaque)
+{
+    TestCoDeleteByDrainData *dbdd = opaque;
+    BlockBackend *blk = dbdd->blk;
+    BlockDriverState *bs = blk_bs(blk);
+    BDRVTestTopState *tts = bs->opaque;
+    void *buffer = g_malloc(65536);
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = buffer,
+        .iov_len  = 65536,
+    };
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    /* Pretend some internal write operation from parent to child.
+     * Important: We have to read from the child, not from the parent!
+     * Draining works by first propagating it all up the tree to the
+     * root and then waiting for drainage from root to the leaves
+     * (protocol nodes).  If we have a request waiting on the root,
+     * everything will be drained before we go back down the tree, but
+     * we do not want that.  We want to be in the middle of draining
+     * when this following requests returns. */
+    bdrv_co_preadv(tts->wait_child, 0, 65536, &qiov, 0);
+    /* The drain is running concurrently, so it must have its own
+     * reference to @bs */
+    g_assert_cmpint(bs->refcnt, ==, 2);
+
+    if (!dbdd->detach_instead_of_delete) {
+        blk_unref(blk);
+    } else {
+        BdrvChild *c, *next_c;
+        QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
+            bdrv_unref_child(bs, c);
+        }
+    }
+
+    dbdd->done = true;
+}
+
+/**
+ * Test what happens when some BDS has some children, you drain one of
+ * them and this results in the BDS being deleted.
+ *
+ * If @detach_instead_of_delete is set, the BDS is not going to be
+ * deleted but will only detach all of its children.
+ */
+static void do_test_delete_by_drain(bool detach_instead_of_delete)
+{
+    BlockBackend *blk;
+    BlockDriverState *bs, *child_bs, *null_bs;
+    BDRVTestTopState *tts;
+    TestCoDeleteByDrainData dbdd;
+    Coroutine *co;
+
+    bs = bdrv_new_open_driver(&bdrv_test_top_driver, "top", BDRV_O_RDWR,
+                              &error_abort);
+    bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
+    tts = bs->opaque;
+
+    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+                        &error_abort);
+    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);
+
+    /* This child will be the one to pass to requests through to, and
+     * it will stall until a drain occurs */
+    child_bs = bdrv_new_open_driver(&bdrv_test, "child", BDRV_O_RDWR,
+                                    &error_abort);
+    child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
+    /* Takes our reference to child_bs */
+    tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child", &child_file,
+                                        &error_abort);
+
+    /* This child is just there to be deleted
+     * (for detach_instead_of_delete == true) */
+    null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
+                        &error_abort);
+    bdrv_attach_child(bs, null_bs, "null-child", &child_file, &error_abort);
+
+    blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    blk_insert_bs(blk, bs, &error_abort);
+
+    /* Referenced by blk now */
+    bdrv_unref(bs);
+
+    g_assert_cmpint(bs->refcnt, ==, 1);
+    g_assert_cmpint(child_bs->refcnt, ==, 1);
+    g_assert_cmpint(null_bs->refcnt, ==, 1);
+
+
+    dbdd = (TestCoDeleteByDrainData){
+        .blk = blk,
+        .detach_instead_of_delete = detach_instead_of_delete,
+        .done = false,
+    };
+    co = qemu_coroutine_create(test_co_delete_by_drain, &dbdd);
+    qemu_coroutine_enter(co);
+
+    /* Drain the child while the read operation is still pending.
+     * This should result in the operation finishing and
+     * test_co_delete_by_drain() resuming.  Thus, @bs will be deleted
+     * and the coroutine will exit while this drain operation is still
+     * in progress. */
+    bdrv_ref(child_bs);
+    bdrv_drain(child_bs);
+    bdrv_unref(child_bs);
+
+    while (!dbdd.done) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
+
+    if (detach_instead_of_delete) {
+        /* Here, the reference has not passed over to the coroutine,
+         * so we have to delete the BB ourselves */
+        blk_unref(blk);
+    }
+}
+
+
+static void test_delete_by_drain(void)
+{
+    do_test_delete_by_drain(false);
+    do_test_delete_by_drain(true);
+}
+
+
 int main(int argc, char **argv)
 {
     bdrv_init();
@@ -647,5 +810,7 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/blockjob/drain_subtree",
                     test_blockjob_drain_subtree);
 
+    g_test_add_func("/bdrv-drain/deletion", test_delete_by_drain);
+
     return g_test_run();
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 04/16] block/mirror: Pull out mirror_perform()
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (2 preceding siblings ...)
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 03/16] tests: Add bdrv-drain test for node deletion Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines Max Reitz
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

When converting mirror's I/O to coroutines, we are going to need a point
where these coroutines are created.  mirror_perform() is going to be
that point.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/mirror.c | 51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index c9badc1203..4066788ee2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -82,6 +82,12 @@ typedef struct MirrorOp {
     uint64_t bytes;
 } MirrorOp;
 
+typedef enum MirrorMethod {
+    MIRROR_METHOD_COPY,
+    MIRROR_METHOD_ZERO,
+    MIRROR_METHOD_DISCARD,
+} MirrorMethod;
+
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
                                             int error)
 {
@@ -321,6 +327,22 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
     }
 }
 
+static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
+                               unsigned bytes, MirrorMethod mirror_method)
+{
+    switch (mirror_method) {
+    case MIRROR_METHOD_COPY:
+        return mirror_do_read(s, offset, bytes);
+    case MIRROR_METHOD_ZERO:
+    case MIRROR_METHOD_DISCARD:
+        mirror_do_zero_or_discard(s, offset, bytes,
+                                  mirror_method == MIRROR_METHOD_DISCARD);
+        return bytes;
+    default:
+        abort();
+    }
+}
+
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->source;
@@ -387,11 +409,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         int ret;
         int64_t io_bytes;
         int64_t io_bytes_acct;
-        enum MirrorMethod {
-            MIRROR_METHOD_COPY,
-            MIRROR_METHOD_ZERO,
-            MIRROR_METHOD_DISCARD
-        } mirror_method = MIRROR_METHOD_COPY;
+        MirrorMethod mirror_method = MIRROR_METHOD_COPY;
 
         assert(!(offset % s->granularity));
         ret = bdrv_block_status_above(source, NULL, offset,
@@ -429,22 +447,11 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         }
 
         io_bytes = mirror_clip_bytes(s, offset, io_bytes);
-        switch (mirror_method) {
-        case MIRROR_METHOD_COPY:
-            io_bytes = io_bytes_acct = mirror_do_read(s, offset, io_bytes);
-            break;
-        case MIRROR_METHOD_ZERO:
-        case MIRROR_METHOD_DISCARD:
-            mirror_do_zero_or_discard(s, offset, io_bytes,
-                                      mirror_method == MIRROR_METHOD_DISCARD);
-            if (write_zeroes_ok) {
-                io_bytes_acct = 0;
-            } else {
-                io_bytes_acct = io_bytes;
-            }
-            break;
-        default:
-            abort();
+        io_bytes = mirror_perform(s, offset, io_bytes, mirror_method);
+        if (mirror_method != MIRROR_METHOD_COPY && write_zeroes_ok) {
+            io_bytes_acct = 0;
+        } else {
+            io_bytes_acct = io_bytes;
         }
         assert(io_bytes);
         offset += io_bytes;
@@ -638,7 +645,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
                 continue;
             }
 
-            mirror_do_zero_or_discard(s, offset, bytes, false);
+            mirror_perform(s, offset, bytes, MIRROR_METHOD_ZERO);
             offset += bytes;
         }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (3 preceding siblings ...)
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 04/16] block/mirror: Pull out mirror_perform() Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-02-27  7:44   ` Fam Zheng
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 06/16] block/mirror: Use CoQueue to wait on in-flight ops Max Reitz
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

In order to talk to the source BDS (and maybe in the future to the
target BDS as well) directly, we need to convert our existing AIO
requests into coroutine I/O requests.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 154 ++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 92 insertions(+), 62 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 4066788ee2..71a8e66850 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -80,6 +80,10 @@ typedef struct MirrorOp {
     QEMUIOVector qiov;
     int64_t offset;
     uint64_t bytes;
+
+    /* The pointee is set by mirror_co_read(), mirror_co_zero(), and
+     * mirror_co_discard() before yielding for the first time */
+    int64_t *bytes_handled;
 } MirrorOp;
 
 typedef enum MirrorMethod {
@@ -101,7 +105,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
     }
 }
 
-static void mirror_iteration_done(MirrorOp *op, int ret)
+static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
 {
     MirrorBlockJob *s = op->s;
     struct iovec *iov;
@@ -138,9 +142,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
     }
 }
 
-static void mirror_write_complete(void *opaque, int ret)
+static void coroutine_fn mirror_write_complete(MirrorOp *op, int ret)
 {
-    MirrorOp *op = opaque;
     MirrorBlockJob *s = op->s;
 
     aio_context_acquire(blk_get_aio_context(s->common.blk));
@@ -157,9 +160,8 @@ static void mirror_write_complete(void *opaque, int ret)
     aio_context_release(blk_get_aio_context(s->common.blk));
 }
 
-static void mirror_read_complete(void *opaque, int ret)
+static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
 {
-    MirrorOp *op = opaque;
     MirrorBlockJob *s = op->s;
 
     aio_context_acquire(blk_get_aio_context(s->common.blk));
@@ -174,8 +176,11 @@ static void mirror_read_complete(void *opaque, int ret)
 
         mirror_iteration_done(op, ret);
     } else {
-        blk_aio_pwritev(s->target, op->offset, &op->qiov,
-                        0, mirror_write_complete, op);
+        int ret;
+
+        ret = blk_co_pwritev(s->target, op->offset,
+                             op->qiov.size, &op->qiov, 0);
+        mirror_write_complete(op, ret);
     }
     aio_context_release(blk_get_aio_context(s->common.blk));
 }
@@ -232,60 +237,57 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
     s->waiting_for_io = false;
 }
 
-/* Submit async read while handling COW.
- * Returns: The number of bytes copied after and including offset,
- *          excluding any bytes copied prior to offset due to alignment.
- *          This will be @bytes if no alignment is necessary, or
- *          (new_end - offset) if tail is rounded up or down due to
- *          alignment or buffer limit.
+/* Perform a mirror copy operation.
+ *
+ * *op->bytes_handled is set to the number of bytes copied after and
+ * including offset, excluding any bytes copied prior to offset due
+ * to alignment.  This will be op->bytes if no alignment is necessary,
+ * or (new_end - op->offset) if the tail is rounded up or down due to
+ * alignment or buffer limit.
  */
-static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
-                               uint64_t bytes)
+static void coroutine_fn mirror_co_read(void *opaque)
 {
+    MirrorOp *op = opaque;
+    MirrorBlockJob *s = op->s;
     BlockBackend *source = s->common.blk;
     int nb_chunks;
     uint64_t ret;
-    MirrorOp *op;
     uint64_t max_bytes;
 
     max_bytes = s->granularity * s->max_iov;
 
     /* We can only handle as much as buf_size at a time. */
-    bytes = MIN(s->buf_size, MIN(max_bytes, bytes));
-    assert(bytes);
-    assert(bytes < BDRV_REQUEST_MAX_BYTES);
-    ret = bytes;
+    op->bytes = MIN(s->buf_size, MIN(max_bytes, op->bytes));
+    assert(op->bytes);
+    assert(op->bytes < BDRV_REQUEST_MAX_BYTES);
+    *op->bytes_handled = op->bytes;
 
     if (s->cow_bitmap) {
-        ret += mirror_cow_align(s, &offset, &bytes);
+        *op->bytes_handled += mirror_cow_align(s, &op->offset, &op->bytes);
     }
-    assert(bytes <= s->buf_size);
+    /* Cannot exceed BDRV_REQUEST_MAX_BYTES + INT_MAX */
+    assert(*op->bytes_handled <= UINT_MAX);
+    assert(op->bytes <= s->buf_size);
     /* The offset is granularity-aligned because:
      * 1) Caller passes in aligned values;
      * 2) mirror_cow_align is used only when target cluster is larger. */
-    assert(QEMU_IS_ALIGNED(offset, s->granularity));
+    assert(QEMU_IS_ALIGNED(op->offset, s->granularity));
     /* The range is sector-aligned, since bdrv_getlength() rounds up. */
-    assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
-    nb_chunks = DIV_ROUND_UP(bytes, s->granularity);
+    assert(QEMU_IS_ALIGNED(op->bytes, BDRV_SECTOR_SIZE));
+    nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
 
     while (s->buf_free_count < nb_chunks) {
-        trace_mirror_yield_in_flight(s, offset, s->in_flight);
+        trace_mirror_yield_in_flight(s, op->offset, s->in_flight);
         mirror_wait_for_io(s);
     }
 
-    /* Allocate a MirrorOp that is used as an AIO callback.  */
-    op = g_new(MirrorOp, 1);
-    op->s = s;
-    op->offset = offset;
-    op->bytes = bytes;
-
     /* Now make a QEMUIOVector taking enough granularity-sized chunks
      * from s->buf_free.
      */
     qemu_iovec_init(&op->qiov, nb_chunks);
     while (nb_chunks-- > 0) {
         MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
-        size_t remaining = bytes - op->qiov.size;
+        size_t remaining = op->bytes - op->qiov.size;
 
         QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
         s->buf_free_count--;
@@ -294,53 +296,81 @@ static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
 
     /* Copy the dirty cluster.  */
     s->in_flight++;
-    s->bytes_in_flight += bytes;
-    trace_mirror_one_iteration(s, offset, bytes);
+    s->bytes_in_flight += op->bytes;
+    trace_mirror_one_iteration(s, op->offset, op->bytes);
 
-    blk_aio_preadv(source, offset, &op->qiov, 0, mirror_read_complete, op);
-    return ret;
+    ret = blk_co_preadv(source, op->offset, op->bytes, &op->qiov, 0);
+    mirror_read_complete(op, ret);
 }
 
-static void mirror_do_zero_or_discard(MirrorBlockJob *s,
-                                      int64_t offset,
-                                      uint64_t bytes,
-                                      bool is_discard)
+static void coroutine_fn mirror_co_zero(void *opaque)
 {
-    MirrorOp *op;
+    MirrorOp *op = opaque;
+    int ret;
 
-    /* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed
-     * so the freeing in mirror_iteration_done is nop. */
-    op = g_new0(MirrorOp, 1);
-    op->s = s;
-    op->offset = offset;
-    op->bytes = bytes;
+    op->s->in_flight++;
+    op->s->bytes_in_flight += op->bytes;
+    *op->bytes_handled = op->bytes;
 
-    s->in_flight++;
-    s->bytes_in_flight += bytes;
-    if (is_discard) {
-        blk_aio_pdiscard(s->target, offset,
-                         op->bytes, mirror_write_complete, op);
-    } else {
-        blk_aio_pwrite_zeroes(s->target, offset,
-                              op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
-                              mirror_write_complete, op);
-    }
+    ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
+                               op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
+    mirror_write_complete(op, ret);
+}
+
+static void coroutine_fn mirror_co_discard(void *opaque)
+{
+    MirrorOp *op = opaque;
+    int ret;
+
+    op->s->in_flight++;
+    op->s->bytes_in_flight += op->bytes;
+    *op->bytes_handled = op->bytes;
+
+    ret = blk_co_pdiscard(op->s->target, op->offset, op->bytes);
+    mirror_write_complete(op, ret);
 }
 
 static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
                                unsigned bytes, MirrorMethod mirror_method)
 {
+    MirrorOp *op;
+    Coroutine *co;
+    int64_t bytes_handled = -1;
+
+    op = g_new(MirrorOp, 1);
+    *op = (MirrorOp){
+        .s              = s,
+        .offset         = offset,
+        .bytes          = bytes,
+        .bytes_handled  = &bytes_handled,
+    };
+
     switch (mirror_method) {
     case MIRROR_METHOD_COPY:
-        return mirror_do_read(s, offset, bytes);
+        co = qemu_coroutine_create(mirror_co_read, op);
+        break;
     case MIRROR_METHOD_ZERO:
+        co = qemu_coroutine_create(mirror_co_zero, op);
+        break;
     case MIRROR_METHOD_DISCARD:
-        mirror_do_zero_or_discard(s, offset, bytes,
-                                  mirror_method == MIRROR_METHOD_DISCARD);
-        return bytes;
+        co = qemu_coroutine_create(mirror_co_discard, op);
+        break;
     default:
         abort();
     }
+
+    qemu_coroutine_enter(co);
+    /* At this point, ownership of op has been moved to the coroutine
+     * and the object may already be freed */
+
+    /* Assert that this value has been set */
+    assert(bytes_handled >= 0);
+
+    /* Same assertion as in mirror_co_read() (and for mirror_co_read()
+     * and mirror_co_discard(), bytes_handled == op->bytes, which
+     * is the @bytes parameter given to this function) */
+    assert(bytes_handled <= UINT_MAX);
+    return bytes_handled;
 }
 
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 06/16] block/mirror: Use CoQueue to wait on in-flight ops
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (4 preceding siblings ...)
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-02-27  8:37   ` Fam Zheng
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 07/16] block/mirror: Wait for in-flight op conflicts Max Reitz
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

Attach a CoQueue to each in-flight operation so if we need to wait for
any we can use it to wait instead of just blindly yielding and hoping
for some operation to wake us.

A later patch will use this infrastructure to allow requests accessing
the same area of the virtual disk to specifically wait for each other.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 71a8e66850..fdd6385766 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/coroutine.h"
 #include "trace.h"
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
@@ -34,6 +35,8 @@ typedef struct MirrorBuffer {
     QSIMPLEQ_ENTRY(MirrorBuffer) next;
 } MirrorBuffer;
 
+typedef struct MirrorOp MirrorOp;
+
 typedef struct MirrorBlockJob {
     BlockJob common;
     RateLimit limit;
@@ -67,15 +70,15 @@ typedef struct MirrorBlockJob {
     unsigned long *in_flight_bitmap;
     int in_flight;
     int64_t bytes_in_flight;
+    QTAILQ_HEAD(MirrorOpList, MirrorOp) ops_in_flight;
     int ret;
     bool unmap;
-    bool waiting_for_io;
     int target_cluster_size;
     int max_iov;
     bool initial_zeroing_ongoing;
 } MirrorBlockJob;
 
-typedef struct MirrorOp {
+struct MirrorOp {
     MirrorBlockJob *s;
     QEMUIOVector qiov;
     int64_t offset;
@@ -84,7 +87,11 @@ typedef struct MirrorOp {
     /* The pointee is set by mirror_co_read(), mirror_co_zero(), and
      * mirror_co_discard() before yielding for the first time */
     int64_t *bytes_handled;
-} MirrorOp;
+
+    CoQueue waiting_requests;
+
+    QTAILQ_ENTRY(MirrorOp) next;
+};
 
 typedef enum MirrorMethod {
     MIRROR_METHOD_COPY,
@@ -125,7 +132,9 @@ static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
 
     chunk_num = op->offset / s->granularity;
     nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
+
     bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
+    QTAILQ_REMOVE(&s->ops_in_flight, op, next);
     if (ret >= 0) {
         if (s->cow_bitmap) {
             bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
@@ -135,11 +144,9 @@ static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
         }
     }
     qemu_iovec_destroy(&op->qiov);
-    g_free(op);
 
-    if (s->waiting_for_io) {
-        qemu_coroutine_enter(s->common.co);
-    }
+    qemu_co_queue_restart_all(&op->waiting_requests);
+    g_free(op);
 }
 
 static void coroutine_fn mirror_write_complete(MirrorOp *op, int ret)
@@ -231,10 +238,11 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
 
 static inline void mirror_wait_for_io(MirrorBlockJob *s)
 {
-    assert(!s->waiting_for_io);
-    s->waiting_for_io = true;
-    qemu_coroutine_yield();
-    s->waiting_for_io = false;
+    MirrorOp *op;
+
+    op = QTAILQ_FIRST(&s->ops_in_flight);
+    assert(op);
+    qemu_co_queue_wait(&op->waiting_requests, NULL);
 }
 
 /* Perform a mirror copy operation.
@@ -344,6 +352,7 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
         .bytes          = bytes,
         .bytes_handled  = &bytes_handled,
     };
+    qemu_co_queue_init(&op->waiting_requests);
 
     switch (mirror_method) {
     case MIRROR_METHOD_COPY:
@@ -359,6 +368,7 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
         abort();
     }
 
+    QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next);
     qemu_coroutine_enter(co);
     /* At this point, ownership of op has been moved to the coroutine
      * and the object may already be freed */
@@ -1287,6 +1297,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
         }
     }
 
+    QTAILQ_INIT(&s->ops_in_flight);
+
     trace_mirror_start(bs, s, opaque);
     block_job_start(&s->common);
     return;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 07/16] block/mirror: Wait for in-flight op conflicts
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (5 preceding siblings ...)
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 06/16] block/mirror: Use CoQueue to wait on in-flight ops Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 08/16] block/mirror: Use source as a BdrvChild Max Reitz
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This patch makes the mirror code differentiate between simply waiting
for any operation to complete (mirror_wait_for_free_in_flight_slot())
and specifically waiting for all operations touching a certain range of
the virtual disk to complete (mirror_wait_on_conflicts()).

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 84 insertions(+), 18 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index fdd6385766..9dbe6a9130 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/coroutine.h"
+#include "qemu/range.h"
 #include "trace.h"
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
@@ -88,6 +89,7 @@ struct MirrorOp {
      * mirror_co_discard() before yielding for the first time */
     int64_t *bytes_handled;
 
+    bool is_pseudo_op;
     CoQueue waiting_requests;
 
     QTAILQ_ENTRY(MirrorOp) next;
@@ -112,6 +114,41 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
     }
 }
 
+static void coroutine_fn mirror_wait_on_conflicts(MirrorOp *self,
+                                                  MirrorBlockJob *s,
+                                                  uint64_t offset,
+                                                  uint64_t bytes)
+{
+    uint64_t self_start_chunk = offset / s->granularity;
+    uint64_t self_end_chunk = DIV_ROUND_UP(offset + bytes, s->granularity);
+    uint64_t self_nb_chunks = self_end_chunk - self_start_chunk;
+
+    while (find_next_bit(s->in_flight_bitmap, self_end_chunk,
+                         self_start_chunk) < self_end_chunk &&
+           s->ret >= 0)
+    {
+        MirrorOp *op;
+
+        QTAILQ_FOREACH(op, &s->ops_in_flight, next) {
+            uint64_t op_start_chunk = op->offset / s->granularity;
+            uint64_t op_nb_chunks = DIV_ROUND_UP(op->offset + op->bytes,
+                                                 s->granularity) -
+                                    op_start_chunk;
+
+            if (op == self) {
+                continue;
+            }
+
+            if (ranges_overlap(self_start_chunk, self_nb_chunks,
+                               op_start_chunk, op_nb_chunks))
+            {
+                qemu_co_queue_wait(&op->waiting_requests, NULL);
+                break;
+            }
+        }
+    }
+}
+
 static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
 {
     MirrorBlockJob *s = op->s;
@@ -236,13 +273,22 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
     return ret;
 }
 
-static inline void mirror_wait_for_io(MirrorBlockJob *s)
+static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
 {
     MirrorOp *op;
 
-    op = QTAILQ_FIRST(&s->ops_in_flight);
-    assert(op);
-    qemu_co_queue_wait(&op->waiting_requests, NULL);
+    QTAILQ_FOREACH(op, &s->ops_in_flight, next) {
+        /* Do not wait on pseudo ops, because it may in turn wait on
+         * some other operation to start, which may in fact be the
+         * caller of this function.  Since there is only one pseudo op
+         * at any given time, we will always find some real operation
+         * to wait on. */
+        if (!op->is_pseudo_op) {
+            qemu_co_queue_wait(&op->waiting_requests, NULL);
+            return;
+        }
+    }
+    abort();
 }
 
 /* Perform a mirror copy operation.
@@ -286,7 +332,7 @@ static void coroutine_fn mirror_co_read(void *opaque)
 
     while (s->buf_free_count < nb_chunks) {
         trace_mirror_yield_in_flight(s, op->offset, s->in_flight);
-        mirror_wait_for_io(s);
+        mirror_wait_for_free_in_flight_slot(s);
     }
 
     /* Now make a QEMUIOVector taking enough granularity-sized chunks
@@ -386,8 +432,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
     BlockDriverState *source = s->source;
-    int64_t offset, first_chunk;
-    uint64_t delay_ns = 0;
+    MirrorOp *pseudo_op;
+    int64_t offset;
+    uint64_t delay_ns = 0, ret = 0;
     /* At least the first dirty chunk is mirrored in one iteration. */
     int nb_chunks = 1;
     bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
@@ -403,11 +450,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     }
     bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
 
-    first_chunk = offset / s->granularity;
-    while (test_bit(first_chunk, s->in_flight_bitmap)) {
-        trace_mirror_yield_in_flight(s, offset, s->in_flight);
-        mirror_wait_for_io(s);
-    }
+    mirror_wait_on_conflicts(NULL, s, offset, 1);
 
     block_job_pause_point(&s->common);
 
@@ -444,6 +487,21 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
                                    nb_chunks * s->granularity);
     bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
 
+    /* Before claiming an area in the in-flight bitmap, we have to
+     * create a MirrorOp for it so that conflicting requests can wait
+     * for it.  mirror_perform() will create the real MirrorOps later,
+     * for now we just create a pseudo operation that will wake up all
+     * conflicting requests once all real operations have been
+     * launched. */
+    pseudo_op = g_new(MirrorOp, 1);
+    *pseudo_op = (MirrorOp){
+        .offset         = offset,
+        .bytes          = nb_chunks * s->granularity,
+        .is_pseudo_op   = true,
+    };
+    qemu_co_queue_init(&pseudo_op->waiting_requests);
+    QTAILQ_INSERT_TAIL(&s->ops_in_flight, pseudo_op, next);
+
     bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
     while (nb_chunks > 0 && offset < s->bdev_length) {
         int ret;
@@ -479,11 +537,12 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 
         while (s->in_flight >= MAX_IN_FLIGHT) {
             trace_mirror_yield_in_flight(s, offset, s->in_flight);
-            mirror_wait_for_io(s);
+            mirror_wait_for_free_in_flight_slot(s);
         }
 
         if (s->ret < 0) {
-            return 0;
+            ret = 0;
+            goto fail;
         }
 
         io_bytes = mirror_clip_bytes(s, offset, io_bytes);
@@ -500,7 +559,14 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
             delay_ns = ratelimit_calculate_delay(&s->limit, io_bytes_acct);
         }
     }
-    return delay_ns;
+
+    ret = delay_ns;
+fail:
+    QTAILQ_REMOVE(&s->ops_in_flight, pseudo_op, next);
+    qemu_co_queue_restart_all(&pseudo_op->waiting_requests);
+    g_free(pseudo_op);
+
+    return ret;
 }
 
 static void mirror_free_init(MirrorBlockJob *s)
@@ -527,7 +593,7 @@ static void mirror_free_init(MirrorBlockJob *s)
 static void mirror_wait_for_all_io(MirrorBlockJob *s)
 {
     while (s->in_flight > 0) {
-        mirror_wait_for_io(s);
+        mirror_wait_for_free_in_flight_slot(s);
     }
 }
 
@@ -681,7 +747,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
             if (s->in_flight >= MAX_IN_FLIGHT) {
                 trace_mirror_yield(s, UINT64_MAX, s->buf_free_count,
                                    s->in_flight);
-                mirror_wait_for_io(s);
+                mirror_wait_for_free_in_flight_slot(s);
                 continue;
             }
 
@@ -856,7 +922,7 @@ static void coroutine_fn mirror_run(void *opaque)
             if (s->in_flight >= MAX_IN_FLIGHT || s->buf_free_count == 0 ||
                 (cnt == 0 && s->in_flight > 0)) {
                 trace_mirror_yield(s, cnt, s->buf_free_count, s->in_flight);
-                mirror_wait_for_io(s);
+                mirror_wait_for_free_in_flight_slot(s);
                 continue;
             } else if (cnt != 0) {
                 delay_ns = mirror_iteration(s);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 08/16] block/mirror: Use source as a BdrvChild
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (6 preceding siblings ...)
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 07/16] block/mirror: Wait for in-flight op conflicts Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 09/16] block: Generalize should_update_child() rule Max Reitz
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

With this, the mirror_top_bs is no longer just a technically required
node in the BDS graph but actually represents the block job operation.

Also, drop MirrorBlockJob.source, as we can reach it through
mirror_top_bs->backing.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9dbe6a9130..2363e79563 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -43,7 +43,6 @@ typedef struct MirrorBlockJob {
     RateLimit limit;
     BlockBackend *target;
     BlockDriverState *mirror_top_bs;
-    BlockDriverState *source;
     BlockDriverState *base;
 
     /* The name of the graph node to replace */
@@ -303,7 +302,6 @@ static void coroutine_fn mirror_co_read(void *opaque)
 {
     MirrorOp *op = opaque;
     MirrorBlockJob *s = op->s;
-    BlockBackend *source = s->common.blk;
     int nb_chunks;
     uint64_t ret;
     uint64_t max_bytes;
@@ -353,7 +351,8 @@ static void coroutine_fn mirror_co_read(void *opaque)
     s->bytes_in_flight += op->bytes;
     trace_mirror_one_iteration(s, op->offset, op->bytes);
 
-    ret = blk_co_preadv(source, op->offset, op->bytes, &op->qiov, 0);
+    ret = bdrv_co_preadv(s->mirror_top_bs->backing, op->offset, op->bytes,
+                         &op->qiov, 0);
     mirror_read_complete(op, ret);
 }
 
@@ -431,7 +430,7 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
 
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
-    BlockDriverState *source = s->source;
+    BlockDriverState *source = s->mirror_top_bs->backing->bs;
     MirrorOp *pseudo_op;
     int64_t offset;
     uint64_t delay_ns = 0, ret = 0;
@@ -606,7 +605,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
     MirrorExitData *data = opaque;
     AioContext *replace_aio_context = NULL;
-    BlockDriverState *src = s->source;
+    BlockDriverState *src = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     BlockDriverState *mirror_top_bs = s->mirror_top_bs;
     Error *local_err = NULL;
@@ -721,7 +720,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 {
     int64_t offset;
     BlockDriverState *base = s->base;
-    BlockDriverState *bs = s->source;
+    BlockDriverState *bs = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     int ret;
     int64_t count;
@@ -803,7 +802,7 @@ static void coroutine_fn mirror_run(void *opaque)
 {
     MirrorBlockJob *s = opaque;
     MirrorExitData *data;
-    BlockDriverState *bs = s->source;
+    BlockDriverState *bs = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
     bool need_drain = true;
     int64_t length;
@@ -1290,7 +1289,6 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
     /* The block job now has a reference to this node */
     bdrv_unref(mirror_top_bs);
 
-    s->source = bs;
     s->mirror_top_bs = mirror_top_bs;
 
     /* No resize for the target either; while the mirror is still running, a
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 09/16] block: Generalize should_update_child() rule
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (7 preceding siblings ...)
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 08/16] block/mirror: Use source as a BdrvChild Max Reitz
@ 2018-01-22 22:07 ` Max Reitz
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 10/16] hbitmap: Add @advance param to hbitmap_iter_next() Max Reitz
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:07 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

Currently, bdrv_replace_node() refuses to create loops from one BDS to
itself if the BDS to be replaced is the backing node of the BDS to
replace it: Say there is a node A and a node B.  Replacing B by A means
making all references to B point to A.  If B is a child of A (i.e. A has
a reference to B), that would mean we would have to make this reference
point to A itself -- so we'd create a loop.

bdrv_replace_node() (through should_update_child()) refuses to do so if
B is the backing node of A.  There is no reason why we should create
loops if B is not the backing node of A, though.  The BDS graph should
never contain loops, so we should always refuse to create them.

If B is a child of A and B is to be replaced by A, we should simply
leave B in place there because it is the most sensible choice.

A more specific argument would be: Putting filter drivers into the BDS
graph is basically the same as appending an overlay to a backing chain.
But the main child BDS of a filter driver is not "backing" but "file",
so restricting the no-loop rule to backing nodes would fail here.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  2 ++
 block.c                   | 44 ++++++++++++++++++++++++++++++++++----------
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 29cafa4236..03f3fdd129 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -587,6 +587,8 @@ struct BdrvChild {
     QLIST_ENTRY(BdrvChild) next_parent;
 };
 
+typedef QLIST_HEAD(BdrvChildList, BdrvChild) BdrvChildList;
+
 /*
  * Note: the function bdrv_append() copies and swaps contents of
  * BlockDriverStates, so if you add new fields to this struct, please
diff --git a/block.c b/block.c
index a8da4f2b25..df50825d94 100644
--- a/block.c
+++ b/block.c
@@ -3320,16 +3320,39 @@ static bool should_update_child(BdrvChild *c, BlockDriverState *to)
         return false;
     }
 
-    if (c->role == &child_backing) {
-        /* If @from is a backing file of @to, ignore the child to avoid
-         * creating a loop. We only want to change the pointer of other
-         * parents. */
-        QLIST_FOREACH(to_c, &to->children, next) {
-            if (to_c == c) {
-                break;
-            }
-        }
-        if (to_c) {
+    /* If the child @c belongs to the BDS @to, replacing the current
+     * c->bs by @to would mean to create a loop.
+     *
+     * Such a case occurs when appending a BDS to a backing chain.
+     * For instance, imagine the following chain:
+     *
+     *   guest device -> node A -> further backing chain...
+     *
+     * Now we create a new BDS B which we want to put on top of this
+     * chain, so we first attach A as its backing node:
+     *
+     *                   node B
+     *                     |
+     *                     v
+     *   guest device -> node A -> further backing chain...
+     *
+     * Finally we want to replace A by B.  When doing that, we want to
+     * replace all pointers to A by pointers to B -- except for the
+     * pointer from B because (1) that would create a loop, and (2)
+     * that pointer should simply stay intact:
+     *
+     *   guest device -> node B
+     *                     |
+     *                     v
+     *                   node A -> further backing chain...
+     *
+     * In general, when replacing a node A (c->bs) by a node B (@to),
+     * if A is a child of B, that means we cannot replace A by B there
+     * because that would create a loop.  Silently detaching A from B
+     * is also not really an option.  So overall just leaving A in
+     * place there is the most sensible choice. */
+    QLIST_FOREACH(to_c, &to->children, next) {
+        if (to_c == c) {
             return false;
         }
     }
@@ -3355,6 +3378,7 @@ void bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
 
     /* Put all parents into @list and calculate their cumulative permissions */
     QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
+        assert(c->bs == from);
         if (!should_update_child(c, to)) {
             continue;
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 10/16] hbitmap: Add @advance param to hbitmap_iter_next()
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (8 preceding siblings ...)
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 09/16] block: Generalize should_update_child() rule Max Reitz
@ 2018-01-22 22:08 ` Max Reitz
  2018-02-27  8:59   ` Fam Zheng
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 11/16] block/dirty-bitmap: Add bdrv_dirty_iter_next_area Max Reitz
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This new parameter allows the caller to just query the next dirty
position without moving the iterator.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/qemu/hbitmap.h |  5 ++++-
 block/backup.c         |  2 +-
 block/dirty-bitmap.c   |  2 +-
 tests/test-hbitmap.c   | 26 +++++++++++++-------------
 util/hbitmap.c         | 10 +++++++---
 5 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 6b6490ecad..ddca52c48e 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -324,11 +324,14 @@ void hbitmap_free_meta(HBitmap *hb);
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
+ * @advance: If true, advance the iterator.  Otherwise, the next call
+ *           of this function will return the same result (if that
+ *           position is still dirty).
  *
  * Return the next bit that is set in @hbi's associated HBitmap,
  * or -1 if all remaining bits are zero.
  */
-int64_t hbitmap_iter_next(HBitmapIter *hbi);
+int64_t hbitmap_iter_next(HBitmapIter *hbi, bool advance);
 
 /**
  * hbitmap_iter_next_word:
diff --git a/block/backup.c b/block/backup.c
index 4a16a37229..c083ce0deb 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -368,7 +368,7 @@ static int coroutine_fn backup_run_incremental(BackupBlockJob *job)
     HBitmapIter hbi;
 
     hbitmap_iter_init(&hbi, job->copy_bitmap, 0);
-    while ((cluster = hbitmap_iter_next(&hbi)) != -1) {
+    while ((cluster = hbitmap_iter_next(&hbi, true)) != -1) {
         do {
             if (yield_and_check(job)) {
                 return 0;
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 7879d13ddb..50564fa1e2 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -498,7 +498,7 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
 
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 {
-    return hbitmap_iter_next(&iter->hbi);
+    return hbitmap_iter_next(&iter->hbi, true);
 }
 
 /* Called within bdrv_dirty_bitmap_lock..unlock */
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 9091c639b3..2a2aa5bd43 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -46,7 +46,7 @@ static void hbitmap_test_check(TestHBitmapData *data,
 
     i = first;
     for (;;) {
-        next = hbitmap_iter_next(&hbi);
+        next = hbitmap_iter_next(&hbi, true);
         if (next < 0) {
             next = data->size;
         }
@@ -435,25 +435,25 @@ static void test_hbitmap_iter_granularity(TestHBitmapData *data,
     /* Note that hbitmap_test_check has to be invoked manually in this test.  */
     hbitmap_test_init(data, 131072 << 7, 7);
     hbitmap_iter_init(&hbi, data->hb, 0);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
     hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
     hbitmap_iter_init(&hbi, data->hb, 0);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
     hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
     hbitmap_test_set(data, (131072 << 7) - 8, 8);
     hbitmap_iter_init(&hbi, data->hb, 0);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 
     hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
-    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
+    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
 }
 
 static void hbitmap_test_set_boundary_bits(TestHBitmapData *data, ssize_t diff)
@@ -893,7 +893,7 @@ static void test_hbitmap_serialize_zeroes(TestHBitmapData *data,
     for (i = 0; i < num_positions; i++) {
         hbitmap_deserialize_zeroes(data->hb, positions[i], min_l1, true);
         hbitmap_iter_init(&iter, data->hb, 0);
-        next = hbitmap_iter_next(&iter);
+        next = hbitmap_iter_next(&iter, true);
         if (i == num_positions - 1) {
             g_assert_cmpint(next, ==, -1);
         } else {
@@ -919,10 +919,10 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData *data,
 
     hbitmap_iter_init(&hbi, data->hb, BITS_PER_LONG - 1);
 
-    hbitmap_iter_next(&hbi);
+    hbitmap_iter_next(&hbi, true);
 
     hbitmap_reset_all(data->hb);
-    hbitmap_iter_next(&hbi);
+    hbitmap_iter_next(&hbi, true);
 }
 
 static void test_hbitmap_next_zero_check(TestHBitmapData *data, int64_t start)
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 289778a55c..ceb7b75bc6 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -141,7 +141,7 @@ unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi)
     return cur;
 }
 
-int64_t hbitmap_iter_next(HBitmapIter *hbi)
+int64_t hbitmap_iter_next(HBitmapIter *hbi, bool advance)
 {
     unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1] &
             hbi->hb->levels[HBITMAP_LEVELS - 1][hbi->pos];
@@ -154,8 +154,12 @@ int64_t hbitmap_iter_next(HBitmapIter *hbi)
         }
     }
 
-    /* The next call will resume work from the next bit.  */
-    hbi->cur[HBITMAP_LEVELS - 1] = cur & (cur - 1);
+    if (advance) {
+        /* The next call will resume work from the next bit.  */
+        hbi->cur[HBITMAP_LEVELS - 1] = cur & (cur - 1);
+    } else {
+        hbi->cur[HBITMAP_LEVELS - 1] = cur;
+    }
     item = ((uint64_t)hbi->pos << BITS_PER_LEVEL) + ctzl(cur);
 
     return item << hbi->granularity;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 11/16] block/dirty-bitmap: Add bdrv_dirty_iter_next_area
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (9 preceding siblings ...)
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 10/16] hbitmap: Add @advance param to hbitmap_iter_next() Max Reitz
@ 2018-01-22 22:08 ` Max Reitz
  2018-02-27  9:06   ` Fam Zheng
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 12/16] block/mirror: Distinguish active from passive ops Max Reitz
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This new function allows to look for a consecutively dirty area in a
dirty bitmap.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/dirty-bitmap.h |  2 ++
 block/dirty-bitmap.c         | 51 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index a591c27213..35f3ccc44c 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -79,6 +79,8 @@ void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
 void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
                                     int64_t offset, int64_t bytes);
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
+bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
+                               uint64_t *offset, int *bytes);
 void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t offset);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 50564fa1e2..484b5dda43 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -501,6 +501,57 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
     return hbitmap_iter_next(&iter->hbi, true);
 }
 
+/**
+ * Return the next consecutively dirty area in the dirty bitmap
+ * belonging to the given iterator @iter.
+ *
+ * @max_offset: Maximum value that may be returned for
+ *              *offset + *bytes
+ * @offset:     Will contain the start offset of the next dirty area
+ * @bytes:      Will contain the length of the next dirty area
+ *
+ * Returns: True if a dirty area could be found before max_offset
+ *          (which means that *offset and *bytes then contain valid
+ *          values), false otherwise.
+ */
+bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
+                               uint64_t *offset, int *bytes)
+{
+    uint32_t granularity = bdrv_dirty_bitmap_granularity(iter->bitmap);
+    uint64_t gran_max_offset;
+    int64_t ret;
+    int size;
+
+    if (max_offset == iter->bitmap->size) {
+        /* If max_offset points to the image end, round it up by the
+         * bitmap granularity */
+        gran_max_offset = ROUND_UP(max_offset, granularity);
+    } else {
+        gran_max_offset = max_offset;
+    }
+
+    ret = hbitmap_iter_next(&iter->hbi, false);
+    if (ret < 0 || ret + granularity > gran_max_offset) {
+        return false;
+    }
+
+    *offset = ret;
+    size = 0;
+
+    assert(granularity <= INT_MAX);
+
+    do {
+        /* Advance iterator */
+        ret = hbitmap_iter_next(&iter->hbi, true);
+        size += granularity;
+    } while (ret + granularity <= gran_max_offset &&
+             hbitmap_iter_next(&iter->hbi, false) == ret + granularity &&
+             size <= INT_MAX - granularity);
+
+    *bytes = MIN(size, max_offset - *offset);
+    return true;
+}
+
 /* Called within bdrv_dirty_bitmap_lock..unlock */
 void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
                                   int64_t offset, int64_t bytes)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 12/16] block/mirror: Distinguish active from passive ops
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (10 preceding siblings ...)
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 11/16] block/dirty-bitmap: Add bdrv_dirty_iter_next_area Max Reitz
@ 2018-01-22 22:08 ` Max Reitz
  2018-02-27  9:12   ` Fam Zheng
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 13/16] block/mirror: Add MirrorBDSOpaque Max Reitz
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

Currently, the mirror block job only knows passive operations.  But once
we introduce active writes, we need to distinguish between the two; for
example, mirror_wait_for_free_in_flight_slot() should wait for a passive
operation because active writes will not use the same in-flight slots.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2363e79563..bb46f3c4e9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -89,6 +89,7 @@ struct MirrorOp {
     int64_t *bytes_handled;
 
     bool is_pseudo_op;
+    bool is_active_write;
     CoQueue waiting_requests;
 
     QTAILQ_ENTRY(MirrorOp) next;
@@ -281,8 +282,10 @@ static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
          * some other operation to start, which may in fact be the
          * caller of this function.  Since there is only one pseudo op
          * at any given time, we will always find some real operation
-         * to wait on. */
-        if (!op->is_pseudo_op) {
+         * to wait on.
+         * Also, only non-active operations use up in-flight slots, so
+         * we can ignore active operations. */
+        if (!op->is_pseudo_op && !op->is_active_write) {
             qemu_co_queue_wait(&op->waiting_requests, NULL);
             return;
         }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 13/16] block/mirror: Add MirrorBDSOpaque
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (11 preceding siblings ...)
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 12/16] block/mirror: Distinguish active from passive ops Max Reitz
@ 2018-01-22 22:08 ` Max Reitz
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring Max Reitz
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This will allow us to access the block job data when the mirror block
driver becomes more complex.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/mirror.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index bb46f3c4e9..c1987b0230 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -78,6 +78,10 @@ typedef struct MirrorBlockJob {
     bool initial_zeroing_ongoing;
 } MirrorBlockJob;
 
+typedef struct MirrorBDSOpaque {
+    MirrorBlockJob *job;
+} MirrorBDSOpaque;
+
 struct MirrorOp {
     MirrorBlockJob *s;
     QEMUIOVector qiov;
@@ -607,6 +611,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 {
     MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
     MirrorExitData *data = opaque;
+    MirrorBDSOpaque *bs_opaque = s->mirror_top_bs->opaque;
     AioContext *replace_aio_context = NULL;
     BlockDriverState *src = s->mirror_top_bs->backing->bs;
     BlockDriverState *target_bs = blk_bs(s->target);
@@ -699,6 +704,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
     blk_set_perm(job->blk, 0, BLK_PERM_ALL, &error_abort);
     blk_insert_bs(job->blk, mirror_top_bs, &error_abort);
 
+    bs_opaque->job = NULL;
     block_job_completed(&s->common, data->ret);
 
     g_free(data);
@@ -1232,6 +1238,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
                              Error **errp)
 {
     MirrorBlockJob *s;
+    MirrorBDSOpaque *bs_opaque;
     BlockDriverState *mirror_top_bs;
     bool target_graph_mod;
     bool target_is_backing;
@@ -1265,6 +1272,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
         mirror_top_bs->implicit = true;
     }
     mirror_top_bs->total_sectors = bs->total_sectors;
+    bs_opaque = g_new0(MirrorBDSOpaque, 1);
+    mirror_top_bs->opaque = bs_opaque;
     bdrv_set_aio_context(mirror_top_bs, bdrv_get_aio_context(bs));
 
     /* bdrv_append takes ownership of the mirror_top_bs reference, need to keep
@@ -1289,6 +1298,8 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
     if (!s) {
         goto fail;
     }
+    bs_opaque->job = s;
+
     /* The block job now has a reference to this node */
     bdrv_unref(mirror_top_bs);
 
@@ -1378,6 +1389,7 @@ fail:
 
         g_free(s->replaces);
         blk_unref(s->target);
+        bs_opaque->job = NULL;
         block_job_early_fail(&s->common);
     }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (12 preceding siblings ...)
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 13/16] block/mirror: Add MirrorBDSOpaque Max Reitz
@ 2018-01-22 22:08 ` Max Reitz
  2018-02-27  9:34   ` Fam Zheng
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 15/16] block/mirror: Add copy mode QAPI interface Max Reitz
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This patch implements active synchronous mirroring.  In active mode, the
passive mechanism will still be in place and is used to copy all
initially dirty clusters off the source disk; but every write request
will write data both to the source and the target disk, so the source
cannot be dirtied faster than data is mirrored to the target.  Also,
once the block job has converged (BLOCK_JOB_READY sent), source and
target are guaranteed to stay in sync (unless an error occurs).

Active mode is completely optional and currently disabled at runtime.  A
later patch will add a way for users to enable it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json |  18 +++++
 block/mirror.c       | 188 +++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 200 insertions(+), 6 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 89ed2bc6a4..ba1fd736f5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -937,6 +937,24 @@
 { 'enum': 'MirrorSyncMode',
   'data': ['top', 'full', 'none', 'incremental'] }
 
+##
+# @MirrorCopyMode:
+#
+# An enumeration whose values tell the mirror block job when to
+# trigger writes to the target.
+#
+# @background: copy data in background only.
+#
+# @write-blocking: when data is written to the source, write it
+#                  (synchronously) to the target as well.  In
+#                  addition, data is copied in background just like in
+#                  @background mode.
+#
+# Since: 2.12
+##
+{ 'enum': 'MirrorCopyMode',
+  'data': ['background', 'write-blocking'] }
+
 ##
 # @BlockJobType:
 #
diff --git a/block/mirror.c b/block/mirror.c
index c1987b0230..83082adb64 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -53,8 +53,12 @@ typedef struct MirrorBlockJob {
     Error *replace_blocker;
     bool is_none_mode;
     BlockMirrorBackingMode backing_mode;
+    MirrorCopyMode copy_mode;
     BlockdevOnError on_source_error, on_target_error;
     bool synced;
+    /* Set when the target is synced (dirty bitmap is clean, nothing
+     * in flight) and the job is running in active mode */
+    bool actively_synced;
     bool should_complete;
     int64_t granularity;
     size_t buf_size;
@@ -76,6 +80,7 @@ typedef struct MirrorBlockJob {
     int target_cluster_size;
     int max_iov;
     bool initial_zeroing_ongoing;
+    int in_active_write_counter;
 } MirrorBlockJob;
 
 typedef struct MirrorBDSOpaque {
@@ -109,6 +114,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
                                             int error)
 {
     s->synced = false;
+    s->actively_synced = false;
     if (read) {
         return block_job_error_action(&s->common, s->on_source_error,
                                       true, error);
@@ -277,7 +283,7 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
     return ret;
 }
 
-static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
+static inline void mirror_wait_for_any_operation(MirrorBlockJob *s, bool active)
 {
     MirrorOp *op;
 
@@ -286,10 +292,8 @@ static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
          * some other operation to start, which may in fact be the
          * caller of this function.  Since there is only one pseudo op
          * at any given time, we will always find some real operation
-         * to wait on.
-         * Also, only non-active operations use up in-flight slots, so
-         * we can ignore active operations. */
-        if (!op->is_pseudo_op && !op->is_active_write) {
+         * to wait on. */
+        if (!op->is_pseudo_op && op->is_active_write == active) {
             qemu_co_queue_wait(&op->waiting_requests, NULL);
             return;
         }
@@ -297,6 +301,12 @@ static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
     abort();
 }
 
+static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
+{
+    /* Only non-active operations use up in-flight slots */
+    mirror_wait_for_any_operation(s, false);
+}
+
 /* Perform a mirror copy operation.
  *
  * *op->bytes_handled is set to the number of bytes copied after and
@@ -854,6 +864,7 @@ static void coroutine_fn mirror_run(void *opaque)
         /* Report BLOCK_JOB_READY and wait for complete. */
         block_job_event_ready(&s->common);
         s->synced = true;
+        s->actively_synced = true;
         while (!block_job_is_cancelled(&s->common) && !s->should_complete) {
             block_job_yield(&s->common);
         }
@@ -905,6 +916,12 @@ static void coroutine_fn mirror_run(void *opaque)
         int64_t cnt, delta;
         bool should_complete;
 
+        /* Do not start passive operations while there are active
+         * writes in progress */
+        while (s->in_active_write_counter) {
+            mirror_wait_for_any_operation(s, true);
+        }
+
         if (s->ret < 0) {
             ret = s->ret;
             goto immediate_exit;
@@ -952,6 +969,9 @@ static void coroutine_fn mirror_run(void *opaque)
                  */
                 block_job_event_ready(&s->common);
                 s->synced = true;
+                if (s->copy_mode != MIRROR_COPY_MODE_BACKGROUND) {
+                    s->actively_synced = true;
+                }
             }
 
             should_complete = s->should_complete ||
@@ -1142,6 +1162,120 @@ static const BlockJobDriver commit_active_job_driver = {
     .drain                  = mirror_drain,
 };
 
+static void do_sync_target_write(MirrorBlockJob *job, uint64_t offset,
+                                 uint64_t bytes, QEMUIOVector *qiov, int flags)
+{
+    BdrvDirtyBitmapIter *iter;
+    QEMUIOVector target_qiov;
+    uint64_t dirty_offset;
+    int dirty_bytes;
+
+    qemu_iovec_init(&target_qiov, qiov->niov);
+
+    iter = bdrv_dirty_iter_new(job->dirty_bitmap);
+    bdrv_set_dirty_iter(iter, offset);
+
+    while (true) {
+        bool valid_area;
+        int ret;
+
+        bdrv_dirty_bitmap_lock(job->dirty_bitmap);
+        valid_area = bdrv_dirty_iter_next_area(iter, offset + bytes,
+                                               &dirty_offset, &dirty_bytes);
+        if (!valid_area) {
+            bdrv_dirty_bitmap_unlock(job->dirty_bitmap);
+            break;
+        }
+
+        bdrv_reset_dirty_bitmap_locked(job->dirty_bitmap,
+                                       dirty_offset, dirty_bytes);
+        bdrv_dirty_bitmap_unlock(job->dirty_bitmap);
+
+        job->common.len += dirty_bytes;
+
+        assert(dirty_offset - offset <= SIZE_MAX);
+        if (qiov) {
+            qemu_iovec_reset(&target_qiov);
+            qemu_iovec_concat(&target_qiov, qiov,
+                              dirty_offset - offset, dirty_bytes);
+        }
+
+        ret = blk_co_pwritev(job->target, dirty_offset, dirty_bytes,
+                             qiov ? &target_qiov : NULL, flags);
+        if (ret >= 0) {
+            job->common.offset += dirty_bytes;
+        } else {
+            BlockErrorAction action;
+
+            bdrv_set_dirty_bitmap(job->dirty_bitmap, dirty_offset, dirty_bytes);
+            job->actively_synced = false;
+
+            action = mirror_error_action(job, false, -ret);
+            if (action == BLOCK_ERROR_ACTION_REPORT) {
+                if (!job->ret) {
+                    job->ret = ret;
+                }
+                break;
+            }
+        }
+    }
+
+    bdrv_dirty_iter_free(iter);
+    qemu_iovec_destroy(&target_qiov);
+}
+
+static MirrorOp *coroutine_fn active_write_prepare(MirrorBlockJob *s,
+                                                   uint64_t offset,
+                                                   uint64_t bytes)
+{
+    MirrorOp *op;
+    uint64_t start_chunk = offset / s->granularity;
+    uint64_t end_chunk = DIV_ROUND_UP(offset + bytes, s->granularity);
+
+    op = g_new(MirrorOp, 1);
+    *op = (MirrorOp){
+        .s                  = s,
+        .offset             = offset,
+        .bytes              = bytes,
+        .is_active_write    = true,
+    };
+    qemu_co_queue_init(&op->waiting_requests);
+    QTAILQ_INSERT_TAIL(&s->ops_in_flight, op, next);
+
+    s->in_active_write_counter++;
+
+    mirror_wait_on_conflicts(op, s, offset, bytes);
+
+    bitmap_set(s->in_flight_bitmap, start_chunk, end_chunk - start_chunk);
+
+    return op;
+}
+
+static void coroutine_fn active_write_settle(MirrorOp *op)
+{
+    uint64_t start_chunk = op->offset / op->s->granularity;
+    uint64_t end_chunk = DIV_ROUND_UP(op->offset + op->bytes,
+                                      op->s->granularity);
+
+    if (!--op->s->in_active_write_counter && op->s->actively_synced) {
+        BdrvChild *source = op->s->mirror_top_bs->backing;
+
+        if (QLIST_FIRST(&source->bs->parents) == source &&
+            QLIST_NEXT(source, next_parent) == NULL)
+        {
+            /* Assert that we are back in sync once all active write
+             * operations are settled.
+             * Note that we can only assert this if the mirror node
+             * is the source node's only parent. */
+            assert(!bdrv_get_dirty_count(op->s->dirty_bitmap));
+        }
+    }
+    bitmap_clear(op->s->in_flight_bitmap, start_chunk, end_chunk - start_chunk);
+    QTAILQ_REMOVE(&op->s->ops_in_flight, op, next);
+    qemu_co_queue_restart_all(&op->waiting_requests);
+    g_free(op);
+}
+
 static int coroutine_fn bdrv_mirror_top_preadv(BlockDriverState *bs,
     uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
@@ -1151,7 +1285,48 @@ static int coroutine_fn bdrv_mirror_top_preadv(BlockDriverState *bs,
 static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
     uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
-    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
+    MirrorOp *op = NULL;
+    MirrorBDSOpaque *s = bs->opaque;
+    QEMUIOVector bounce_qiov;
+    void *bounce_buf;
+    int ret = 0;
+    bool copy_to_target;
+
+    copy_to_target = s->job->ret >= 0 &&
+                     s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+
+    if (copy_to_target) {
+        /* The guest might concurrently modify the data to write; but
+         * the data on source and destination must match, so we have
+         * to use a bounce buffer if we are going to write to the
+         * target now. */
+        bounce_buf = qemu_blockalign(bs, bytes);
+        iov_to_buf_full(qiov->iov, qiov->niov, 0, bounce_buf, bytes);
+
+        qemu_iovec_init(&bounce_qiov, 1);
+        qemu_iovec_add(&bounce_qiov, bounce_buf, bytes);
+        qiov = &bounce_qiov;
+
+        op = active_write_prepare(s->job, offset, bytes);
+    }
+
+    ret = bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
+    if (ret < 0) {
+        goto out;
+    }
+
+    if (copy_to_target) {
+        do_sync_target_write(s->job, offset, bytes, qiov, flags);
+    }
+
+out:
+    if (copy_to_target) {
+        active_write_settle(op);
+
+        qemu_iovec_destroy(&bounce_qiov);
+        qemu_vfree(bounce_buf);
+    }
+    return ret;
 }
 
 static int coroutine_fn bdrv_mirror_top_flush(BlockDriverState *bs)
@@ -1340,6 +1515,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
     s->on_target_error = on_target_error;
     s->is_none_mode = is_none_mode;
     s->backing_mode = backing_mode;
+    s->copy_mode = MIRROR_COPY_MODE_BACKGROUND;
     s->base = base;
     s->granularity = granularity;
     s->buf_size = ROUND_UP(buf_size, granularity);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 15/16] block/mirror: Add copy mode QAPI interface
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (13 preceding siblings ...)
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring Max Reitz
@ 2018-01-22 22:08 ` Max Reitz
  2018-02-27  9:38   ` Fam Zheng
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 16/16] iotests: Add test for active mirroring Max Reitz
  2018-02-24 15:42 ` [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

This patch allows the user to specify whether to use active or only
passive mode for mirror block jobs.  Currently, this setting will remain
constant for the duration of the entire block job.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qapi/block-core.json      | 11 +++++++++--
 include/block/block_int.h |  4 +++-
 block/mirror.c            | 12 +++++++-----
 blockdev.c                |  9 ++++++++-
 4 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ba1fd736f5..5fafa5fcac 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1573,6 +1573,9 @@
 #         written. Both will result in identical contents.
 #         Default is true. (Since 2.4)
 #
+# @copy-mode: when to copy data to the destination; defaults to 'passive'
+#             (Since: 2.12)
+#
 # Since: 1.3
 ##
 { 'struct': 'DriveMirror',
@@ -1582,7 +1585,7 @@
             '*speed': 'int', '*granularity': 'uint32',
             '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError',
-            '*unmap': 'bool' } }
+            '*unmap': 'bool', '*copy-mode': 'MirrorCopyMode' } }
 
 ##
 # @BlockDirtyBitmap:
@@ -1761,6 +1764,9 @@
 #                    above @device. If this option is not given, a node name is
 #                    autogenerated. (Since: 2.9)
 #
+# @copy-mode: when to copy data to the destination; defaults to 'passive'
+#             (Since: 2.12)
+#
 # Returns: nothing on success.
 #
 # Since: 2.6
@@ -1781,7 +1787,8 @@
             '*speed': 'int', '*granularity': 'uint32',
             '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError',
-            '*filter-node-name': 'str' } }
+            '*filter-node-name': 'str',
+            '*copy-mode': 'MirrorCopyMode' } }
 
 ##
 # @block_set_io_throttle:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 03f3fdd129..1fda4d3d43 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -948,6 +948,7 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
  * @filter_node_name: The node name that should be assigned to the filter
  * driver that the mirror job inserts into the graph above @bs. NULL means that
  * a node name should be autogenerated.
+ * @copy_mode: When to trigger writes to the target.
  * @errp: Error object.
  *
  * Start a mirroring operation on @bs.  Clusters that are allocated
@@ -961,7 +962,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
-                  bool unmap, const char *filter_node_name, Error **errp);
+                  bool unmap, const char *filter_node_name,
+                  MirrorCopyMode copy_mode, Error **errp);
 
 /*
  * backup_job_create:
diff --git a/block/mirror.c b/block/mirror.c
index 83082adb64..3b23886a5a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1409,7 +1409,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
                              const BlockJobDriver *driver,
                              bool is_none_mode, BlockDriverState *base,
                              bool auto_complete, const char *filter_node_name,
-                             bool is_mirror,
+                             bool is_mirror, MirrorCopyMode copy_mode,
                              Error **errp)
 {
     MirrorBlockJob *s;
@@ -1515,7 +1515,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
     s->on_target_error = on_target_error;
     s->is_none_mode = is_none_mode;
     s->backing_mode = backing_mode;
-    s->copy_mode = MIRROR_COPY_MODE_BACKGROUND;
+    s->copy_mode = copy_mode;
     s->base = base;
     s->granularity = granularity;
     s->buf_size = ROUND_UP(buf_size, granularity);
@@ -1582,7 +1582,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
-                  bool unmap, const char *filter_node_name, Error **errp)
+                  bool unmap, const char *filter_node_name,
+                  MirrorCopyMode copy_mode, Error **errp)
 {
     bool is_none_mode;
     BlockDriverState *base;
@@ -1597,7 +1598,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                      speed, granularity, buf_size, backing_mode,
                      on_source_error, on_target_error, unmap, NULL, NULL,
                      &mirror_job_driver, is_none_mode, base, false,
-                     filter_node_name, true, errp);
+                     filter_node_name, true, copy_mode, errp);
 }
 
 void commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -1620,7 +1621,8 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
                      MIRROR_LEAVE_BACKING_CHAIN,
                      on_error, on_error, true, cb, opaque,
                      &commit_active_job_driver, false, base, auto_complete,
-                     filter_node_name, false, &local_err);
+                     filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
+                     &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto error_restore_flags;
diff --git a/blockdev.c b/blockdev.c
index 8e977eef11..35d8228819 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3491,6 +3491,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
                                    bool has_unmap, bool unmap,
                                    bool has_filter_node_name,
                                    const char *filter_node_name,
+                                   bool has_copy_mode, MirrorCopyMode copy_mode,
                                    Error **errp)
 {
 
@@ -3515,6 +3516,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     if (!has_filter_node_name) {
         filter_node_name = NULL;
     }
+    if (!has_copy_mode) {
+        copy_mode = MIRROR_COPY_MODE_BACKGROUND;
+    }
 
     if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "granularity",
@@ -3545,7 +3549,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
                  has_replaces ? replaces : NULL,
                  speed, granularity, buf_size, sync, backing_mode,
                  on_source_error, on_target_error, unmap, filter_node_name,
-                 errp);
+                 copy_mode, errp);
 }
 
 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
@@ -3686,6 +3690,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
                            arg->has_on_target_error, arg->on_target_error,
                            arg->has_unmap, arg->unmap,
                            false, NULL,
+                           arg->has_copy_mode, arg->copy_mode,
                            &local_err);
     bdrv_unref(target_bs);
     error_propagate(errp, local_err);
@@ -3706,6 +3711,7 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
                          BlockdevOnError on_target_error,
                          bool has_filter_node_name,
                          const char *filter_node_name,
+                         bool has_copy_mode, MirrorCopyMode copy_mode,
                          Error **errp)
 {
     BlockDriverState *bs;
@@ -3738,6 +3744,7 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
                            has_on_target_error, on_target_error,
                            true, true,
                            has_filter_node_name, filter_node_name,
+                           has_copy_mode, copy_mode,
                            &local_err);
     error_propagate(errp, local_err);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 16/16] iotests: Add test for active mirroring
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (14 preceding siblings ...)
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 15/16] block/mirror: Add copy mode QAPI interface Max Reitz
@ 2018-01-22 22:08 ` Max Reitz
  2018-02-24 15:42 ` [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
  16 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-01-22 22:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Max Reitz, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/151     | 114 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/151.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 120 insertions(+)
 create mode 100755 tests/qemu-iotests/151
 create mode 100644 tests/qemu-iotests/151.out

diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
new file mode 100755
index 0000000000..5e064d62a2
--- /dev/null
+++ b/tests/qemu-iotests/151
@@ -0,0 +1,114 @@
+#!/usr/bin/env python
+#
+# Tests for active mirroring
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import iotests
+from iotests import qemu_img
+
+source_img = os.path.join(iotests.test_dir, 'source.' + iotests.imgfmt)
+target_img = os.path.join(iotests.test_dir, 'target.' + iotests.imgfmt)
+
+class TestActiveMirror(iotests.QMPTestCase):
+    image_len = 128 * 1024 * 1024 # MB
+    potential_writes_in_flight = True
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, source_img, '128M')
+        qemu_img('create', '-f', iotests.imgfmt, target_img, '128M')
+
+        blk_source = {'id': 'source',
+                      'if': 'none',
+                      'node-name': 'source-node',
+                      'driver': iotests.imgfmt,
+                      'file': {'driver': 'file',
+                               'filename': source_img}}
+
+        blk_target = {'node-name': 'target-node',
+                      'driver': iotests.imgfmt,
+                      'file': {'driver': 'file',
+                               'filename': target_img}}
+
+        self.vm = iotests.VM()
+        self.vm.add_drive_raw(self.qmp_to_opts(blk_source))
+        self.vm.add_blockdev(self.qmp_to_opts(blk_target))
+        self.vm.add_device('virtio-blk,drive=source')
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+
+        if not self.potential_writes_in_flight:
+            self.assertTrue(iotests.compare_images(source_img, target_img),
+                            'mirror target does not match source')
+
+        os.remove(source_img)
+        os.remove(target_img)
+
+    def doActiveIO(self, sync_source_and_target):
+        # Fill the source image
+        self.vm.hmp_qemu_io('source',
+                            'write -P 1 0 %i' % self.image_len);
+
+        # Start some background requests
+        for offset in range(0, self.image_len, 1024 * 1024):
+            self.vm.hmp_qemu_io('source', 'aio_write -P 2 %i 1M' % offset)
+
+        # Start the block job
+        result = self.vm.qmp('blockdev-mirror',
+                             job_id='mirror',
+                             filter_node_name='mirror-node',
+                             device='source-node',
+                             target='target-node',
+                             sync='full',
+                             copy_mode='write-blocking')
+        self.assert_qmp(result, 'return', {})
+
+        # Start some more requests
+        for offset in range(0, self.image_len, 1024 * 1024):
+            self.vm.hmp_qemu_io('source', 'aio_write -P 3 %i 1M' % offset)
+
+        # Wait for the READY event
+        self.wait_ready(drive='mirror')
+
+        # Now start some final requests; all of these (which land on
+        # the source) should be settled using the active mechanism.
+        # The mirror code itself asserts that the source BDS's dirty
+        # bitmap will stay clean between READY and COMPLETED.
+        for offset in range(0, self.image_len, 1024 * 1024):
+            self.vm.hmp_qemu_io('source', 'aio_write -P 4 %i 1M' % offset)
+
+        if sync_source_and_target:
+            # If source and target should be in sync after the mirror,
+            # we have to flush before completion
+            self.vm.hmp_qemu_io('source', 'aio_flush')
+            self.potential_writes_in_flight = False
+
+        self.complete_and_wait(drive='mirror', wait_ready=False)
+
+    def testActiveIO(self):
+        self.doActiveIO(False)
+
+    def testActiveIOFlushed(self):
+        self.doActiveIO(True)
+
+
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qcow2', 'raw'])
diff --git a/tests/qemu-iotests/151.out b/tests/qemu-iotests/151.out
new file mode 100644
index 0000000000..fbc63e62f8
--- /dev/null
+++ b/tests/qemu-iotests/151.out
@@ -0,0 +1,5 @@
+..
+----------------------------------------------------------------------
+Ran 2 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 8fc4f62cca..1a37531d78 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -157,6 +157,7 @@
 148 rw auto quick
 149 rw auto sudo
 150 rw auto quick
+151 rw auto
 152 rw auto quick
 153 rw auto quick
 154 rw auto backing quick
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring
  2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
                   ` (15 preceding siblings ...)
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 16/16] iotests: Add test for active mirroring Max Reitz
@ 2018-02-24 15:42 ` Max Reitz
  2018-02-27  9:51   ` Fam Zheng
  16 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-02-24 15:42 UTC (permalink / raw)
  To: qemu-block; +Cc: qemu-devel, Kevin Wolf, John Snow, Fam Zheng, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1437 bytes --]

Pïng

On 2018-01-22 23:07, Max Reitz wrote:
> This series implements an active and synchronous mirroring mode.
> 
> Currently, the mirror block job is passive an asynchronous: Depending on
> your start conditions, some part of the source disk starts as "dirty".
> Then, the block job will (as a background operation) continuously copy
> dirty parts to the target disk until all of the source disk is clean.
> In the meantime, any write to the source disk dirties the affected area.
> 
> One effect of this operational mode is that the job may never converge:
> If the writes to the source happen faster than the block job copies data
> to the target, the job can never finish.
> 
> When the active mode implemented in this series is enabled, every write
> request to the source will automatically trigger a synchronous write to
> the target right afterwards.  Therefore, the source can never get dirty
> faster than data is copied to the target.  Most importantly, once source
> and target are in sync (BLOCK_JOB_READY is emitted), they will not
> diverge (unless e.g. an I/O error occurs).
> 
> Active mirroring also improves on a second issue of the passive mode: We
> do not have to read data from the source in order to write it to the
> target.  When new data is written to the source in active mode, it is
> automatically mirrored to the target, which saves us the superfluous
> read from the source.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines Max Reitz
@ 2018-02-27  7:44   ` Fam Zheng
  2018-02-28 14:13     ` Max Reitz
  0 siblings, 1 reply; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  7:44 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Mon, 01/22 23:07, Max Reitz wrote:
> @@ -101,7 +105,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
>      }
>  }
>  
> -static void mirror_iteration_done(MirrorOp *op, int ret)
> +static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
>  {
>      MirrorBlockJob *s = op->s;
>      struct iovec *iov;

I think we want s/qemu_coroutine_enter/aio_co_wake/ in mirror_iteration_done().
As an AIO callback before, this didn't matter, but now we are in an terminating
coroutine, so it is pointless to defer the termination, or even risky in that we
are in a aio_context_acquire/release section, but have already decremented
s->in_flight, which is fishy.

> @@ -138,9 +142,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>      }
>  }
>  
> -static void mirror_write_complete(void *opaque, int ret)
> +static void coroutine_fn mirror_write_complete(MirrorOp *op, int ret)
>  {
> -    MirrorOp *op = opaque;
>      MirrorBlockJob *s = op->s;
>  
>      aio_context_acquire(blk_get_aio_context(s->common.blk));
> @@ -157,9 +160,8 @@ static void mirror_write_complete(void *opaque, int ret)
>      aio_context_release(blk_get_aio_context(s->common.blk));
>  }
>  
> -static void mirror_read_complete(void *opaque, int ret)
> +static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
>  {
> -    MirrorOp *op = opaque;
>      MirrorBlockJob *s = op->s;
>  
>      aio_context_acquire(blk_get_aio_context(s->common.blk));
> @@ -174,8 +176,11 @@ static void mirror_read_complete(void *opaque, int ret)
>  
>          mirror_iteration_done(op, ret);
>      } else {
> -        blk_aio_pwritev(s->target, op->offset, &op->qiov,
> -                        0, mirror_write_complete, op);
> +        int ret;

s/ret/ret2/ or drop the definition?
because ret is already the paramter of the function.

> +
> +        ret = blk_co_pwritev(s->target, op->offset,
> +                             op->qiov.size, &op->qiov, 0);
> +        mirror_write_complete(op, ret);
>      }
>      aio_context_release(blk_get_aio_context(s->common.blk));
>  }

<snip>

> +static void coroutine_fn mirror_co_discard(void *opaque)
> +{
> +    MirrorOp *op = opaque;
> +    int ret;
> +
> +    op->s->in_flight++;
> +    op->s->bytes_in_flight += op->bytes;
> +    *op->bytes_handled = op->bytes;
> +
> +    ret = blk_co_pdiscard(op->s->target, op->offset, op->bytes);
> +    mirror_write_complete(op, ret);
>  }
>  
>  static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
>                                 unsigned bytes, MirrorMethod mirror_method)

Doesn't mirror_perform need coroutine_fn annotation too?

>  {

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/16] block/mirror: Use CoQueue to wait on in-flight ops
  2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 06/16] block/mirror: Use CoQueue to wait on in-flight ops Max Reitz
@ 2018-02-27  8:37   ` Fam Zheng
  0 siblings, 0 replies; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  8:37 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Mon, 01/22 23:07, Max Reitz wrote:
>      qemu_iovec_destroy(&op->qiov);
> -    g_free(op);
>  
> -    if (s->waiting_for_io) {
> -        qemu_coroutine_enter(s->common.co);
> -    }
> +    qemu_co_queue_restart_all(&op->waiting_requests);
> +    g_free(op);

OK, this answers my question to patch 5.

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/16] hbitmap: Add @advance param to hbitmap_iter_next()
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 10/16] hbitmap: Add @advance param to hbitmap_iter_next() Max Reitz
@ 2018-02-27  8:59   ` Fam Zheng
  2018-02-28 14:28     ` Max Reitz
  0 siblings, 1 reply; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  8:59 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Mon, 01/22 23:08, Max Reitz wrote:
> diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
> index 9091c639b3..2a2aa5bd43 100644
> --- a/tests/test-hbitmap.c
> +++ b/tests/test-hbitmap.c
> @@ -46,7 +46,7 @@ static void hbitmap_test_check(TestHBitmapData *data,
>  
>      i = first;
>      for (;;) {
> -        next = hbitmap_iter_next(&hbi);
> +        next = hbitmap_iter_next(&hbi, true);
>          if (next < 0) {
>              next = data->size;
>          }
> @@ -435,25 +435,25 @@ static void test_hbitmap_iter_granularity(TestHBitmapData *data,
>      /* Note that hbitmap_test_check has to be invoked manually in this test.  */
>      hbitmap_test_init(data, 131072 << 7, 7);
>      hbitmap_iter_init(&hbi, data->hb, 0);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>  
>      hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
>      hbitmap_iter_init(&hbi, data->hb, 0);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>  
>      hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>  
>      hbitmap_test_set(data, (131072 << 7) - 8, 8);
>      hbitmap_iter_init(&hbi, data->hb, 0);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>  
>      hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>  }

Please add tests for advance=false too.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/16] block/dirty-bitmap: Add bdrv_dirty_iter_next_area
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 11/16] block/dirty-bitmap: Add bdrv_dirty_iter_next_area Max Reitz
@ 2018-02-27  9:06   ` Fam Zheng
  2018-02-28 14:57     ` Max Reitz
  0 siblings, 1 reply; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  9:06 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Mon, 01/22 23:08, Max Reitz wrote:
> This new function allows to look for a consecutively dirty area in a
> dirty bitmap.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  include/block/dirty-bitmap.h |  2 ++
>  block/dirty-bitmap.c         | 51 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index a591c27213..35f3ccc44c 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -79,6 +79,8 @@ void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
>  void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
>                                      int64_t offset, int64_t bytes);
>  int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
> +bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
> +                               uint64_t *offset, int *bytes);
>  void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t offset);
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>  int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 50564fa1e2..484b5dda43 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -501,6 +501,57 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
>      return hbitmap_iter_next(&iter->hbi, true);
>  }
>  
> +/**
> + * Return the next consecutively dirty area in the dirty bitmap
> + * belonging to the given iterator @iter.
> + *
> + * @max_offset: Maximum value that may be returned for
> + *              *offset + *bytes
> + * @offset:     Will contain the start offset of the next dirty area
> + * @bytes:      Will contain the length of the next dirty area
> + *
> + * Returns: True if a dirty area could be found before max_offset
> + *          (which means that *offset and *bytes then contain valid
> + *          values), false otherwise.

Also document the change to the iter cursor depending on the return value?

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/16] block/mirror: Distinguish active from passive ops
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 12/16] block/mirror: Distinguish active from passive ops Max Reitz
@ 2018-02-27  9:12   ` Fam Zheng
  2018-02-28 15:05     ` Max Reitz
  0 siblings, 1 reply; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  9:12 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Mon, 01/22 23:08, Max Reitz wrote:
> Currently, the mirror block job only knows passive operations.  But once
> we introduce active writes, we need to distinguish between the two; for
> example, mirror_wait_for_free_in_flight_slot() should wait for a passive
> operation because active writes will not use the same in-flight slots.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/mirror.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 2363e79563..bb46f3c4e9 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -89,6 +89,7 @@ struct MirrorOp {
>      int64_t *bytes_handled;
>  
>      bool is_pseudo_op;
> +    bool is_active_write;
>      CoQueue waiting_requests;
>  
>      QTAILQ_ENTRY(MirrorOp) next;
> @@ -281,8 +282,10 @@ static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
>           * some other operation to start, which may in fact be the
>           * caller of this function.  Since there is only one pseudo op
>           * at any given time, we will always find some real operation
> -         * to wait on. */
> -        if (!op->is_pseudo_op) {
> +         * to wait on.
> +         * Also, only non-active operations use up in-flight slots, so
> +         * we can ignore active operations. */
> +        if (!op->is_pseudo_op && !op->is_active_write) {
>              qemu_co_queue_wait(&op->waiting_requests, NULL);
>              return;
>          }
> -- 
> 2.14.3
> 

I'd just squash this patch into 14 to avoid code churn.

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring Max Reitz
@ 2018-02-27  9:34   ` Fam Zheng
  2018-02-27 14:25     ` Eric Blake
  2018-02-28 15:06     ` Max Reitz
  0 siblings, 2 replies; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  9:34 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Mon, 01/22 23:08, Max Reitz wrote:
> @@ -1151,7 +1285,48 @@ static int coroutine_fn bdrv_mirror_top_preadv(BlockDriverState *bs,
>  static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
>      uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
>  {
> -    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
> +    MirrorOp *op = NULL;
> +    MirrorBDSOpaque *s = bs->opaque;
> +    QEMUIOVector bounce_qiov;
> +    void *bounce_buf;
> +    int ret = 0;
> +    bool copy_to_target;
> +
> +    copy_to_target = s->job->ret >= 0 &&
> +                     s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
> +
> +    if (copy_to_target) {
> +        /* The guest might concurrently modify the data to write; but
> +         * the data on source and destination must match, so we have
> +         * to use a bounce buffer if we are going to write to the
> +         * target now. */
> +        bounce_buf = qemu_blockalign(bs, bytes);
> +        iov_to_buf_full(qiov->iov, qiov->niov, 0, bounce_buf, bytes);

Quorum doesn't use a bounce buffer, so I think we can get away without it too: a
guest concurrently modifying the buffer isn't a concern in practice.

> +
> +        qemu_iovec_init(&bounce_qiov, 1);
> +        qemu_iovec_add(&bounce_qiov, bounce_buf, bytes);
> +        qiov = &bounce_qiov;
> +
> +        op = active_write_prepare(s->job, offset, bytes);
> +    }
> +
> +    ret = bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    if (copy_to_target) {
> +        do_sync_target_write(s->job, offset, bytes, qiov, flags);
> +    }
> +
> +out:
> +    if (copy_to_target) {
> +        active_write_settle(op);
> +
> +        qemu_iovec_destroy(&bounce_qiov);
> +        qemu_vfree(bounce_buf);
> +    }
> +    return ret;
>  }
>  
>  static int coroutine_fn bdrv_mirror_top_flush(BlockDriverState *bs)

Don't you need to update bdrv_mirror_top_pdiscard and bdrv_mirror_top_pwritev
too?

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/16] block/mirror: Add copy mode QAPI interface
  2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 15/16] block/mirror: Add copy mode QAPI interface Max Reitz
@ 2018-02-27  9:38   ` Fam Zheng
  2018-02-28 15:55     ` Max Reitz
  0 siblings, 1 reply; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  9:38 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Mon, 01/22 23:08, Max Reitz wrote:
> This patch allows the user to specify whether to use active or only
> passive mode for mirror block jobs.  Currently, this setting will remain

I think you want s/passive/background/ in the whole patch.

> constant for the duration of the entire block job.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  qapi/block-core.json      | 11 +++++++++--
>  include/block/block_int.h |  4 +++-
>  block/mirror.c            | 12 +++++++-----
>  blockdev.c                |  9 ++++++++-
>  4 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ba1fd736f5..5fafa5fcac 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1573,6 +1573,9 @@
>  #         written. Both will result in identical contents.
>  #         Default is true. (Since 2.4)
>  #
> +# @copy-mode: when to copy data to the destination; defaults to 'passive'
> +#             (Since: 2.12)
> +#
>  # Since: 1.3
>  ##
>  { 'struct': 'DriveMirror',
> @@ -1582,7 +1585,7 @@
>              '*speed': 'int', '*granularity': 'uint32',
>              '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
>              '*on-target-error': 'BlockdevOnError',
> -            '*unmap': 'bool' } }
> +            '*unmap': 'bool', '*copy-mode': 'MirrorCopyMode' } }
>  
>  ##
>  # @BlockDirtyBitmap:
> @@ -1761,6 +1764,9 @@
>  #                    above @device. If this option is not given, a node name is
>  #                    autogenerated. (Since: 2.9)
>  #
> +# @copy-mode: when to copy data to the destination; defaults to 'passive'
> +#             (Since: 2.12)
> +#
>  # Returns: nothing on success.
>  #
>  # Since: 2.6
> @@ -1781,7 +1787,8 @@
>              '*speed': 'int', '*granularity': 'uint32',
>              '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
>              '*on-target-error': 'BlockdevOnError',
> -            '*filter-node-name': 'str' } }
> +            '*filter-node-name': 'str',
> +            '*copy-mode': 'MirrorCopyMode' } }
>  
>  ##
>  # @block_set_io_throttle:
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 03f3fdd129..1fda4d3d43 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -948,6 +948,7 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
>   * @filter_node_name: The node name that should be assigned to the filter
>   * driver that the mirror job inserts into the graph above @bs. NULL means that
>   * a node name should be autogenerated.
> + * @copy_mode: When to trigger writes to the target.
>   * @errp: Error object.
>   *
>   * Start a mirroring operation on @bs.  Clusters that are allocated
> @@ -961,7 +962,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>                    MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
>                    BlockdevOnError on_source_error,
>                    BlockdevOnError on_target_error,
> -                  bool unmap, const char *filter_node_name, Error **errp);
> +                  bool unmap, const char *filter_node_name,
> +                  MirrorCopyMode copy_mode, Error **errp);
>  
>  /*
>   * backup_job_create:
> diff --git a/block/mirror.c b/block/mirror.c
> index 83082adb64..3b23886a5a 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1409,7 +1409,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>                               const BlockJobDriver *driver,
>                               bool is_none_mode, BlockDriverState *base,
>                               bool auto_complete, const char *filter_node_name,
> -                             bool is_mirror,
> +                             bool is_mirror, MirrorCopyMode copy_mode,
>                               Error **errp)
>  {
>      MirrorBlockJob *s;
> @@ -1515,7 +1515,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>      s->on_target_error = on_target_error;
>      s->is_none_mode = is_none_mode;
>      s->backing_mode = backing_mode;
> -    s->copy_mode = MIRROR_COPY_MODE_BACKGROUND;
> +    s->copy_mode = copy_mode;
>      s->base = base;
>      s->granularity = granularity;
>      s->buf_size = ROUND_UP(buf_size, granularity);
> @@ -1582,7 +1582,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>                    MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
>                    BlockdevOnError on_source_error,
>                    BlockdevOnError on_target_error,
> -                  bool unmap, const char *filter_node_name, Error **errp)
> +                  bool unmap, const char *filter_node_name,
> +                  MirrorCopyMode copy_mode, Error **errp)
>  {
>      bool is_none_mode;
>      BlockDriverState *base;
> @@ -1597,7 +1598,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>                       speed, granularity, buf_size, backing_mode,
>                       on_source_error, on_target_error, unmap, NULL, NULL,
>                       &mirror_job_driver, is_none_mode, base, false,
> -                     filter_node_name, true, errp);
> +                     filter_node_name, true, copy_mode, errp);
>  }
>  
>  void commit_active_start(const char *job_id, BlockDriverState *bs,
> @@ -1620,7 +1621,8 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
>                       MIRROR_LEAVE_BACKING_CHAIN,
>                       on_error, on_error, true, cb, opaque,
>                       &commit_active_job_driver, false, base, auto_complete,
> -                     filter_node_name, false, &local_err);
> +                     filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,

I think for active commit, MIRROR_COPY_MODE_WRITE_BLOCKING is more appealing?
Can be a task for another day, though.

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring
  2018-02-24 15:42 ` [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
@ 2018-02-27  9:51   ` Fam Zheng
  0 siblings, 0 replies; 35+ messages in thread
From: Fam Zheng @ 2018-02-27  9:51 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Sat, 02/24 16:42, Max Reitz wrote:
> Pïng

Looks good in general. I've left some small questions though.

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring
  2018-02-27  9:34   ` Fam Zheng
@ 2018-02-27 14:25     ` Eric Blake
  2018-02-28 15:06     ` Max Reitz
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Blake @ 2018-02-27 14:25 UTC (permalink / raw)
  To: Fam Zheng, Max Reitz
  Cc: Kevin Wolf, Stefan Hajnoczi, John Snow, qemu-devel, qemu-block

On 02/27/2018 03:34 AM, Fam Zheng wrote:
> On Mon, 01/22 23:08, Max Reitz wrote:
>> @@ -1151,7 +1285,48 @@ static int coroutine_fn bdrv_mirror_top_preadv(BlockDriverState *bs,
>>   static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
>>       uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
>>   {
>> -    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
>> +    MirrorOp *op = NULL;
>> +    MirrorBDSOpaque *s = bs->opaque;
>> +    QEMUIOVector bounce_qiov;
>> +    void *bounce_buf;
>> +    int ret = 0;
>> +    bool copy_to_target;
>> +
>> +    copy_to_target = s->job->ret >= 0 &&
>> +                     s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
>> +
>> +    if (copy_to_target) {
>> +        /* The guest might concurrently modify the data to write; but
>> +         * the data on source and destination must match, so we have
>> +         * to use a bounce buffer if we are going to write to the
>> +         * target now. */
>> +        bounce_buf = qemu_blockalign(bs, bytes);
>> +        iov_to_buf_full(qiov->iov, qiov->niov, 0, bounce_buf, bytes);
> 
> Quorum doesn't use a bounce buffer, so I think we can get away without it too: a
> guest concurrently modifying the buffer isn't a concern in practice.

Arguably, that's a bug in quorum.  We also use a bounce buffer for the 
same reason when encrypting.  We really do need to make sure that bits 
landing in more than one storage location come from the same point in time.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines
  2018-02-27  7:44   ` Fam Zheng
@ 2018-02-28 14:13     ` Max Reitz
  2018-02-28 17:07       ` Max Reitz
  0 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-02-28 14:13 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 3400 bytes --]

On 2018-02-27 08:44, Fam Zheng wrote:
> On Mon, 01/22 23:07, Max Reitz wrote:
>> @@ -101,7 +105,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
>>      }
>>  }
>>  
>> -static void mirror_iteration_done(MirrorOp *op, int ret)
>> +static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
>>  {
>>      MirrorBlockJob *s = op->s;
>>      struct iovec *iov;
> 
> I think we want s/qemu_coroutine_enter/aio_co_wake/ in mirror_iteration_done().
> As an AIO callback before, this didn't matter, but now we are in an terminating
> coroutine, so it is pointless to defer the termination, or even risky in that we
> are in a aio_context_acquire/release section, but have already decremented
> s->in_flight, which is fishy.

I guess I'll still do the replacement, regardless of whether the next
patch overwrites it again...

>> @@ -138,9 +142,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
>>      }
>>  }
>>  
>> -static void mirror_write_complete(void *opaque, int ret)
>> +static void coroutine_fn mirror_write_complete(MirrorOp *op, int ret)
>>  {
>> -    MirrorOp *op = opaque;
>>      MirrorBlockJob *s = op->s;
>>  
>>      aio_context_acquire(blk_get_aio_context(s->common.blk));
>> @@ -157,9 +160,8 @@ static void mirror_write_complete(void *opaque, int ret)
>>      aio_context_release(blk_get_aio_context(s->common.blk));
>>  }
>>  
>> -static void mirror_read_complete(void *opaque, int ret)
>> +static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
>>  {
>> -    MirrorOp *op = opaque;
>>      MirrorBlockJob *s = op->s;
>>  
>>      aio_context_acquire(blk_get_aio_context(s->common.blk));
>> @@ -174,8 +176,11 @@ static void mirror_read_complete(void *opaque, int ret)
>>  
>>          mirror_iteration_done(op, ret);
>>      } else {
>> -        blk_aio_pwritev(s->target, op->offset, &op->qiov,
>> -                        0, mirror_write_complete, op);
>> +        int ret;
> 
> s/ret/ret2/ or drop the definition?
> because ret is already the paramter of the function.

Oh, right, yes, will do.

>> +
>> +        ret = blk_co_pwritev(s->target, op->offset,
>> +                             op->qiov.size, &op->qiov, 0);
>> +        mirror_write_complete(op, ret);
>>      }
>>      aio_context_release(blk_get_aio_context(s->common.blk));
>>  }
> 
> <snip>
> 
>> +static void coroutine_fn mirror_co_discard(void *opaque)
>> +{
>> +    MirrorOp *op = opaque;
>> +    int ret;
>> +
>> +    op->s->in_flight++;
>> +    op->s->bytes_in_flight += op->bytes;
>> +    *op->bytes_handled = op->bytes;
>> +
>> +    ret = blk_co_pdiscard(op->s->target, op->offset, op->bytes);
>> +    mirror_write_complete(op, ret);
>>  }
>>  
>>  static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
>>                                 unsigned bytes, MirrorMethod mirror_method)
> 
> Doesn't mirror_perform need coroutine_fn annotation too?

I don't think it needs one.  We could give it one, but as far as I've
understood (which may be wrong), all functions that need to be run from
a coroutine need the tag -- but functions that may be called from either
coroutines or just normal code don't need it.

(And I think this function should be fine either way, so I don't think
it needs a tag.)


Also, thanks for reviewing! :-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/16] hbitmap: Add @advance param to hbitmap_iter_next()
  2018-02-27  8:59   ` Fam Zheng
@ 2018-02-28 14:28     ` Max Reitz
  0 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-02-28 14:28 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2589 bytes --]

On 2018-02-27 09:59, Fam Zheng wrote:
> On Mon, 01/22 23:08, Max Reitz wrote:
>> diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
>> index 9091c639b3..2a2aa5bd43 100644
>> --- a/tests/test-hbitmap.c
>> +++ b/tests/test-hbitmap.c
>> @@ -46,7 +46,7 @@ static void hbitmap_test_check(TestHBitmapData *data,
>>  
>>      i = first;
>>      for (;;) {
>> -        next = hbitmap_iter_next(&hbi);
>> +        next = hbitmap_iter_next(&hbi, true);
>>          if (next < 0) {
>>              next = data->size;
>>          }
>> @@ -435,25 +435,25 @@ static void test_hbitmap_iter_granularity(TestHBitmapData *data,
>>      /* Note that hbitmap_test_check has to be invoked manually in this test.  */
>>      hbitmap_test_init(data, 131072 << 7, 7);
>>      hbitmap_iter_init(&hbi, data->hb, 0);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>>  
>>      hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
>>      hbitmap_iter_init(&hbi, data->hb, 0);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>>  
>>      hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>>  
>>      hbitmap_test_set(data, (131072 << 7) - 8, 8);
>>      hbitmap_iter_init(&hbi, data->hb, 0);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, (L2 + L1 + 1) << 7);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, (L2 + L1 + 1) << 7);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>>  
>>      hbitmap_iter_init(&hbi, data->hb, (L2 + L1 + 2) << 7);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), ==, 131071 << 7);
>> -    g_assert_cmpint(hbitmap_iter_next(&hbi), <, 0);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), ==, 131071 << 7);
>> +    g_assert_cmpint(hbitmap_iter_next(&hbi, true), <, 0);
>>  }
> 
> Please add tests for advance=false too.

:C

But writing tests is so hard!  And then most of the time you even find
bugs in your own code.  Which makes it even harder... O:-)

Will do.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/16] block/dirty-bitmap: Add bdrv_dirty_iter_next_area
  2018-02-27  9:06   ` Fam Zheng
@ 2018-02-28 14:57     ` Max Reitz
  0 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-02-28 14:57 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2295 bytes --]

On 2018-02-27 10:06, Fam Zheng wrote:
> On Mon, 01/22 23:08, Max Reitz wrote:
>> This new function allows to look for a consecutively dirty area in a
>> dirty bitmap.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  include/block/dirty-bitmap.h |  2 ++
>>  block/dirty-bitmap.c         | 51 ++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 53 insertions(+)
>>
>> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
>> index a591c27213..35f3ccc44c 100644
>> --- a/include/block/dirty-bitmap.h
>> +++ b/include/block/dirty-bitmap.h
>> @@ -79,6 +79,8 @@ void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
>>  void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
>>                                      int64_t offset, int64_t bytes);
>>  int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
>> +bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
>> +                               uint64_t *offset, int *bytes);
>>  void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t offset);
>>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>>  int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>> index 50564fa1e2..484b5dda43 100644
>> --- a/block/dirty-bitmap.c
>> +++ b/block/dirty-bitmap.c
>> @@ -501,6 +501,57 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
>>      return hbitmap_iter_next(&iter->hbi, true);
>>  }
>>  
>> +/**
>> + * Return the next consecutively dirty area in the dirty bitmap
>> + * belonging to the given iterator @iter.
>> + *
>> + * @max_offset: Maximum value that may be returned for
>> + *              *offset + *bytes
>> + * @offset:     Will contain the start offset of the next dirty area
>> + * @bytes:      Will contain the length of the next dirty area
>> + *
>> + * Returns: True if a dirty area could be found before max_offset
>> + *          (which means that *offset and *bytes then contain valid
>> + *          values), false otherwise.
> 
> Also document the change to the iter cursor depending on the return value?

Good point, since it may be unexpected that it isn't advanced if
max_offset would be exceeded.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/16] block/mirror: Distinguish active from passive ops
  2018-02-27  9:12   ` Fam Zheng
@ 2018-02-28 15:05     ` Max Reitz
  0 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-02-28 15:05 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1875 bytes --]

On 2018-02-27 10:12, Fam Zheng wrote:
> On Mon, 01/22 23:08, Max Reitz wrote:
>> Currently, the mirror block job only knows passive operations.  But once
>> we introduce active writes, we need to distinguish between the two; for
>> example, mirror_wait_for_free_in_flight_slot() should wait for a passive
>> operation because active writes will not use the same in-flight slots.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/mirror.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 2363e79563..bb46f3c4e9 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -89,6 +89,7 @@ struct MirrorOp {
>>      int64_t *bytes_handled;
>>  
>>      bool is_pseudo_op;
>> +    bool is_active_write;
>>      CoQueue waiting_requests;
>>  
>>      QTAILQ_ENTRY(MirrorOp) next;
>> @@ -281,8 +282,10 @@ static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
>>           * some other operation to start, which may in fact be the
>>           * caller of this function.  Since there is only one pseudo op
>>           * at any given time, we will always find some real operation
>> -         * to wait on. */
>> -        if (!op->is_pseudo_op) {
>> +         * to wait on.
>> +         * Also, only non-active operations use up in-flight slots, so
>> +         * we can ignore active operations. */
>> +        if (!op->is_pseudo_op && !op->is_active_write) {
>>              qemu_co_queue_wait(&op->waiting_requests, NULL);
>>              return;
>>          }
>> -- 
>> 2.14.3
>>
> 
> I'd just squash this patch into 14 to avoid code churn.

I wanted to pull out as much as possible from 14 since it is massive as
it is, but if you think it makes reviewing even harder... Then I can put
it in there, of course.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring
  2018-02-27  9:34   ` Fam Zheng
  2018-02-27 14:25     ` Eric Blake
@ 2018-02-28 15:06     ` Max Reitz
  1 sibling, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-02-28 15:06 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2214 bytes --]

On 2018-02-27 10:34, Fam Zheng wrote:
> On Mon, 01/22 23:08, Max Reitz wrote:
>> @@ -1151,7 +1285,48 @@ static int coroutine_fn bdrv_mirror_top_preadv(BlockDriverState *bs,
>>  static int coroutine_fn bdrv_mirror_top_pwritev(BlockDriverState *bs,
>>      uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
>>  {
>> -    return bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
>> +    MirrorOp *op = NULL;
>> +    MirrorBDSOpaque *s = bs->opaque;
>> +    QEMUIOVector bounce_qiov;
>> +    void *bounce_buf;
>> +    int ret = 0;
>> +    bool copy_to_target;
>> +
>> +    copy_to_target = s->job->ret >= 0 &&
>> +                     s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
>> +
>> +    if (copy_to_target) {
>> +        /* The guest might concurrently modify the data to write; but
>> +         * the data on source and destination must match, so we have
>> +         * to use a bounce buffer if we are going to write to the
>> +         * target now. */
>> +        bounce_buf = qemu_blockalign(bs, bytes);
>> +        iov_to_buf_full(qiov->iov, qiov->niov, 0, bounce_buf, bytes);
> 
> Quorum doesn't use a bounce buffer, so I think we can get away without it too: a
> guest concurrently modifying the buffer isn't a concern in practice.
> 
>> +
>> +        qemu_iovec_init(&bounce_qiov, 1);
>> +        qemu_iovec_add(&bounce_qiov, bounce_buf, bytes);
>> +        qiov = &bounce_qiov;
>> +
>> +        op = active_write_prepare(s->job, offset, bytes);
>> +    }
>> +
>> +    ret = bdrv_co_pwritev(bs->backing, offset, bytes, qiov, flags);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    if (copy_to_target) {
>> +        do_sync_target_write(s->job, offset, bytes, qiov, flags);
>> +    }
>> +
>> +out:
>> +    if (copy_to_target) {
>> +        active_write_settle(op);
>> +
>> +        qemu_iovec_destroy(&bounce_qiov);
>> +        qemu_vfree(bounce_buf);
>> +    }
>> +    return ret;
>>  }
>>  
>>  static int coroutine_fn bdrv_mirror_top_flush(BlockDriverState *bs)
> 
> Don't you need to update bdrv_mirror_top_pdiscard and bdrv_mirror_top_pwritev
> too?

Now that you mention it...

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/16] block/mirror: Add copy mode QAPI interface
  2018-02-27  9:38   ` Fam Zheng
@ 2018-02-28 15:55     ` Max Reitz
  0 siblings, 0 replies; 35+ messages in thread
From: Max Reitz @ 2018-02-28 15:55 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 6790 bytes --]

On 2018-02-27 10:38, Fam Zheng wrote:
> On Mon, 01/22 23:08, Max Reitz wrote:
>> This patch allows the user to specify whether to use active or only
>> passive mode for mirror block jobs.  Currently, this setting will remain
> 
> I think you want s/passive/background/ in the whole patch.

Errr, yes.

Max

>> constant for the duration of the entire block job.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  qapi/block-core.json      | 11 +++++++++--
>>  include/block/block_int.h |  4 +++-
>>  block/mirror.c            | 12 +++++++-----
>>  blockdev.c                |  9 ++++++++-
>>  4 files changed, 27 insertions(+), 9 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index ba1fd736f5..5fafa5fcac 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -1573,6 +1573,9 @@
>>  #         written. Both will result in identical contents.
>>  #         Default is true. (Since 2.4)
>>  #
>> +# @copy-mode: when to copy data to the destination; defaults to 'passive'
>> +#             (Since: 2.12)
>> +#
>>  # Since: 1.3
>>  ##
>>  { 'struct': 'DriveMirror',
>> @@ -1582,7 +1585,7 @@
>>              '*speed': 'int', '*granularity': 'uint32',
>>              '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
>>              '*on-target-error': 'BlockdevOnError',
>> -            '*unmap': 'bool' } }
>> +            '*unmap': 'bool', '*copy-mode': 'MirrorCopyMode' } }
>>  
>>  ##
>>  # @BlockDirtyBitmap:
>> @@ -1761,6 +1764,9 @@
>>  #                    above @device. If this option is not given, a node name is
>>  #                    autogenerated. (Since: 2.9)
>>  #
>> +# @copy-mode: when to copy data to the destination; defaults to 'passive'
>> +#             (Since: 2.12)
>> +#
>>  # Returns: nothing on success.
>>  #
>>  # Since: 2.6
>> @@ -1781,7 +1787,8 @@
>>              '*speed': 'int', '*granularity': 'uint32',
>>              '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
>>              '*on-target-error': 'BlockdevOnError',
>> -            '*filter-node-name': 'str' } }
>> +            '*filter-node-name': 'str',
>> +            '*copy-mode': 'MirrorCopyMode' } }
>>  
>>  ##
>>  # @block_set_io_throttle:
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 03f3fdd129..1fda4d3d43 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -948,6 +948,7 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
>>   * @filter_node_name: The node name that should be assigned to the filter
>>   * driver that the mirror job inserts into the graph above @bs. NULL means that
>>   * a node name should be autogenerated.
>> + * @copy_mode: When to trigger writes to the target.
>>   * @errp: Error object.
>>   *
>>   * Start a mirroring operation on @bs.  Clusters that are allocated
>> @@ -961,7 +962,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>>                    MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
>>                    BlockdevOnError on_source_error,
>>                    BlockdevOnError on_target_error,
>> -                  bool unmap, const char *filter_node_name, Error **errp);
>> +                  bool unmap, const char *filter_node_name,
>> +                  MirrorCopyMode copy_mode, Error **errp);
>>  
>>  /*
>>   * backup_job_create:
>> diff --git a/block/mirror.c b/block/mirror.c
>> index 83082adb64..3b23886a5a 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1409,7 +1409,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>>                               const BlockJobDriver *driver,
>>                               bool is_none_mode, BlockDriverState *base,
>>                               bool auto_complete, const char *filter_node_name,
>> -                             bool is_mirror,
>> +                             bool is_mirror, MirrorCopyMode copy_mode,
>>                               Error **errp)
>>  {
>>      MirrorBlockJob *s;
>> @@ -1515,7 +1515,7 @@ static void mirror_start_job(const char *job_id, BlockDriverState *bs,
>>      s->on_target_error = on_target_error;
>>      s->is_none_mode = is_none_mode;
>>      s->backing_mode = backing_mode;
>> -    s->copy_mode = MIRROR_COPY_MODE_BACKGROUND;
>> +    s->copy_mode = copy_mode;
>>      s->base = base;
>>      s->granularity = granularity;
>>      s->buf_size = ROUND_UP(buf_size, granularity);
>> @@ -1582,7 +1582,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>>                    MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
>>                    BlockdevOnError on_source_error,
>>                    BlockdevOnError on_target_error,
>> -                  bool unmap, const char *filter_node_name, Error **errp)
>> +                  bool unmap, const char *filter_node_name,
>> +                  MirrorCopyMode copy_mode, Error **errp)
>>  {
>>      bool is_none_mode;
>>      BlockDriverState *base;
>> @@ -1597,7 +1598,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
>>                       speed, granularity, buf_size, backing_mode,
>>                       on_source_error, on_target_error, unmap, NULL, NULL,
>>                       &mirror_job_driver, is_none_mode, base, false,
>> -                     filter_node_name, true, errp);
>> +                     filter_node_name, true, copy_mode, errp);
>>  }
>>  
>>  void commit_active_start(const char *job_id, BlockDriverState *bs,
>> @@ -1620,7 +1621,8 @@ void commit_active_start(const char *job_id, BlockDriverState *bs,
>>                       MIRROR_LEAVE_BACKING_CHAIN,
>>                       on_error, on_error, true, cb, opaque,
>>                       &commit_active_job_driver, false, base, auto_complete,
>> -                     filter_node_name, false, &local_err);
>> +                     filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
> 
> I think for active commit, MIRROR_COPY_MODE_WRITE_BLOCKING is more appealing?
> Can be a task for another day, though.

Hmm...  In theory a good point.  But in practice we currently still do
the write to both the overlay and the backing file, so it has the same
issues as normal mirroring.

But for active commit, we have the interesting property that one write
would actually be sufficient: We could only do the write to the backing
file, and then punch a hole into the overlay so that the backing file
data shines through.  So if we did that, we should really do
MIRROR_COPY_MODE_WRITE_BLOCKING.

But we currently don't, so I'd rather keep it in background mode until
we do that. :-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines
  2018-02-28 14:13     ` Max Reitz
@ 2018-02-28 17:07       ` Max Reitz
  2018-03-01  2:50         ` Fam Zheng
  0 siblings, 1 reply; 35+ messages in thread
From: Max Reitz @ 2018-02-28 17:07 UTC (permalink / raw)
  To: Fam Zheng; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

On 2018-02-28 15:13, Max Reitz wrote:
> On 2018-02-27 08:44, Fam Zheng wrote:
>> On Mon, 01/22 23:07, Max Reitz wrote:
>>> @@ -101,7 +105,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
>>>      }
>>>  }
>>>  
>>> -static void mirror_iteration_done(MirrorOp *op, int ret)
>>> +static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
>>>  {
>>>      MirrorBlockJob *s = op->s;
>>>      struct iovec *iov;
>>
>> I think we want s/qemu_coroutine_enter/aio_co_wake/ in mirror_iteration_done().
>> As an AIO callback before, this didn't matter, but now we are in an terminating
>> coroutine, so it is pointless to defer the termination, or even risky in that we
>> are in a aio_context_acquire/release section, but have already decremented
>> s->in_flight, which is fishy.
> 
> I guess I'll still do the replacement, regardless of whether the next
> patch overwrites it again...

Maybe I don't.  Doing this breaks iotest 041 because the
assert(data.done) in bdrv_co_yield_to_drain() fails.

Not sure why that is, but under the circumstance I guess it's best to
just pretend this never happened, continue to use qemu_coroutine_enter()
and just replace it in the next patch.

As for in_flight: What is the issue there?  We mostly need that to know
how many I/O requests are actually running, that is, how much buffer
space is used, how many I/O is done concurrently, etc. (and later we
need the in-flight information so that we don't access the target in
overlapping areas concurrently).  But it doesn't seem to be about how
many coroutines there are.

So as long as the s->in_flight decrement is done in the same critical
section as the op is deleted, we should be good...?

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines
  2018-02-28 17:07       ` Max Reitz
@ 2018-03-01  2:50         ` Fam Zheng
  0 siblings, 0 replies; 35+ messages in thread
From: Fam Zheng @ 2018-03-01  2:50 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-block, qemu-devel, Kevin Wolf, John Snow, Stefan Hajnoczi

On Wed, 02/28 18:07, Max Reitz wrote:
> On 2018-02-28 15:13, Max Reitz wrote:
> > On 2018-02-27 08:44, Fam Zheng wrote:
> >> On Mon, 01/22 23:07, Max Reitz wrote:
> >>> @@ -101,7 +105,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
> >>>      }
> >>>  }
> >>>  
> >>> -static void mirror_iteration_done(MirrorOp *op, int ret)
> >>> +static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
> >>>  {
> >>>      MirrorBlockJob *s = op->s;
> >>>      struct iovec *iov;
> >>
> >> I think we want s/qemu_coroutine_enter/aio_co_wake/ in mirror_iteration_done().
> >> As an AIO callback before, this didn't matter, but now we are in an terminating
> >> coroutine, so it is pointless to defer the termination, or even risky in that we
> >> are in a aio_context_acquire/release section, but have already decremented
> >> s->in_flight, which is fishy.
> > 
> > I guess I'll still do the replacement, regardless of whether the next
> > patch overwrites it again...
> 
> Maybe I don't.  Doing this breaks iotest 041 because the
> assert(data.done) in bdrv_co_yield_to_drain() fails.
> 
> Not sure why that is, but under the circumstance I guess it's best to
> just pretend this never happened, continue to use qemu_coroutine_enter()
> and just replace it in the next patch.
> 
> As for in_flight: What is the issue there?  We mostly need that to know
> how many I/O requests are actually running, that is, how much buffer
> space is used, how many I/O is done concurrently, etc. (and later we
> need the in-flight information so that we don't access the target in
> overlapping areas concurrently).  But it doesn't seem to be about how
> many coroutines there are.
> 
> So as long as the s->in_flight decrement is done in the same critical
> section as the op is deleted, we should be good...?

I don't have a specific problem in my mind but is just generally concerned about
the "if (s->in_flight == 0)" checks around mirror_exit.

Fam

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-03-01  2:50 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 22:07 [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 01/16] block: BDS deletion during bdrv_drain_recurse Max Reitz
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 02/16] block: BDS deletion in bdrv_do_drained_begin() Max Reitz
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 03/16] tests: Add bdrv-drain test for node deletion Max Reitz
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 04/16] block/mirror: Pull out mirror_perform() Max Reitz
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 05/16] block/mirror: Convert to coroutines Max Reitz
2018-02-27  7:44   ` Fam Zheng
2018-02-28 14:13     ` Max Reitz
2018-02-28 17:07       ` Max Reitz
2018-03-01  2:50         ` Fam Zheng
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 06/16] block/mirror: Use CoQueue to wait on in-flight ops Max Reitz
2018-02-27  8:37   ` Fam Zheng
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 07/16] block/mirror: Wait for in-flight op conflicts Max Reitz
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 08/16] block/mirror: Use source as a BdrvChild Max Reitz
2018-01-22 22:07 ` [Qemu-devel] [PATCH v2 09/16] block: Generalize should_update_child() rule Max Reitz
2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 10/16] hbitmap: Add @advance param to hbitmap_iter_next() Max Reitz
2018-02-27  8:59   ` Fam Zheng
2018-02-28 14:28     ` Max Reitz
2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 11/16] block/dirty-bitmap: Add bdrv_dirty_iter_next_area Max Reitz
2018-02-27  9:06   ` Fam Zheng
2018-02-28 14:57     ` Max Reitz
2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 12/16] block/mirror: Distinguish active from passive ops Max Reitz
2018-02-27  9:12   ` Fam Zheng
2018-02-28 15:05     ` Max Reitz
2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 13/16] block/mirror: Add MirrorBDSOpaque Max Reitz
2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 14/16] block/mirror: Add active mirroring Max Reitz
2018-02-27  9:34   ` Fam Zheng
2018-02-27 14:25     ` Eric Blake
2018-02-28 15:06     ` Max Reitz
2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 15/16] block/mirror: Add copy mode QAPI interface Max Reitz
2018-02-27  9:38   ` Fam Zheng
2018-02-28 15:55     ` Max Reitz
2018-01-22 22:08 ` [Qemu-devel] [PATCH v2 16/16] iotests: Add test for active mirroring Max Reitz
2018-02-24 15:42 ` [Qemu-devel] [PATCH v2 00/16] block/mirror: Add active-sync mirroring Max Reitz
2018-02-27  9:51   ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.