All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case
@ 2017-01-20  3:47 zhanghailiang
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: zhanghailiang @ 2017-01-20  3:47 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst, xuquan8,
	zhanghailiang, Juan Quintela, Amit Shah, Dr . David Alan Gilbert,
	eddie.dong

COLO block replication doesn't support the shared disk case,
Here we try to implement it and this is the third version.

Last posted series patches:
https://lists.gnu.org/archive/html/qemu-block/2016-12/msg00039.html
You can refer to the above link if want to test it.

I have uploaded the new version to github:
https://github.com/coloft/qemu/tree/colo-developing-with-shared-disk-2016-1-20

Please review and any commits are welcomed.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com> 
Cc: Dr. David Alan Gilbert (git) <dgilbert@redhat.com>
Cc: eddie.dong@intel.com

v3:
- Fix some comments from Stefan and Eric

v2:
- Drop the patch which add a blk_root() helper
- Fix some comments from Changlong

zhanghailiang (6):
  docs/block-replication: Add description for shared-disk case
  replication: add shared-disk and shared-disk-id options
  replication: Split out backup_do_checkpoint() from
    secondary_do_checkpoint()
  replication: fix code logic with the new shared_disk option
  replication: Implement block replication for shared disk case
  nbd/replication: implement .bdrv_get_info() for nbd and replication
    driver

 block/nbd.c                |  12 ++++
 block/replication.c        | 156 +++++++++++++++++++++++++++++++++++----------
 docs/block-replication.txt | 139 ++++++++++++++++++++++++++++++++++++++--
 qapi/block-core.json       |  10 ++-
 4 files changed, 279 insertions(+), 38 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case
  2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
@ 2017-01-20  3:47 ` zhanghailiang
  2017-02-27 16:46   ` Stefan Hajnoczi
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: zhanghailiang @ 2017-01-20  3:47 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst, xuquan8,
	zhanghailiang, Wen Congyang

Introuduce the scenario of shared-disk block replication
and how to use it.

Reviewed-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
 docs/block-replication.txt | 139 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 135 insertions(+), 4 deletions(-)

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..fbfe005 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation
 effort during a vmstate checkpoint, the disk modification operations of
 the Primary disk are asynchronously forwarded to the Secondary node.
 
-== Workflow ==
+== Non-shared disk workflow ==
 The following is the image of block replication workflow:
 
         +----------------------+            +------------------------+
@@ -57,7 +57,7 @@ The following is the image of block replication workflow:
     4) Secondary write requests will be buffered in the Disk buffer and it
        will overwrite the existing sector content in the buffer.
 
-== Architecture ==
+== Non-shared disk architecture ==
 We are going to implement block replication from many basic
 blocks that are already in QEMU.
 
@@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through
 of the NBD server into the secondary disk. So before block replication,
 the primary disk and secondary disk should contain the same data.
 
+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  | (2)Forward and write through |             |
+                  | +--------------------------> | Disk Buffer |
+                  | |                            |             |
+                  | |                            \-------------/
+                  | |(1)read                           |
+                  | |                                  |
+       (3)write   | |                                  | backing file
+                  V |                                  |
+                 +-----------------------------+       |
+                 | Shared Disk                 | <-----+
+                 +-----------------------------+
+
+    1) Primary writes will read original data and forward it to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Shared disk, the
+       original sector content will be read from Shared disk and
+       forwarded and buffered in the Disk buffer on the secondary site,
+       but it will not overwrite the existing sector content (it could be
+       from either "Secondary Write Requests" or previous COW of "Primary
+       Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Shared disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+         virtio-blk                     ||                               .----------
+             /                          ||                               | Secondary
+            /                           ||                               '----------
+           /                            ||                                 virtio-blk
+          /                             ||                                      |
+          |                             ||                               replication(5)
+          |                    NBD  -------->   NBD   (2)                       |
+          |                  client     ||    server ---> hidden disk <-- active disk(4)
+          |                     ^       ||                      |
+          |              replication(1) ||                      |
+          |                     |       ||                      |
+          |   +-----------------'       ||                      |
+         (3)  |drive-backup sync=none   ||                      |
+--------. |   +-----------------+       ||                      |
+Primary | |                     |       ||           backing    |
+--------' |                     |       ||                      |
+          V                     |                               |
+       +-------------------------------------------+            |
+       |               shared disk                 | <----------+
+       +-------------------------------------------+
+
+
+    1) Primary writes will read original data and forward it to Secondary
+       QEMU.
+    2) The hidden-disk buffers the original content that is modified by the
+       primary VM. It should also be an empty disk, and the driver supports
+       bdrv_make_empty() and backing file.
+    3) Primary write requests will be written to Shared disk.
+    4) Secondary write requests will be buffered in the active disk and it
+       will overwrite the existing sector content in the buffer.
+
 == Failure Handling ==
 There are 7 internal errors when block replication is running:
 1. I/O error on primary disk
@@ -145,7 +213,7 @@ d. replication_stop_all()
    things except failover. The caller must hold the I/O mutex lock if it is
    in migration/checkpoint thread.
 
-== Usage ==
+== Non-shared disk usage ==
 Primary:
   -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
          children.0.file.filename=1.raw,\
@@ -234,6 +302,69 @@ Secondary:
   The primary host is down, so we should do the following thing:
   { 'execute': 'nbd-server-stop' }
 
+== Shared disk usage ==
+Primary:
+ -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
+
+Issue qmp command:
+  { 'execute': 'blockdev-add',
+    'arguments': {
+        'driver': 'replication',
+        'node-name': 'rep',
+        'mode': 'primary',
+        'shared-disk-id': 'primary_disk0',
+        'shared-disk': true,
+        'file': {
+            'driver': 'nbd',
+            'export': 'hidden_disk0',
+            'server': {
+                'type': 'inet',
+                'data': {
+                    'host': 'xxx.xxx.xxx.xxx',
+                    'port': 'yyy'
+                }
+            }
+        }
+     }
+  }
+
+Secondary:
+ -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+        backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+        file.driver=qcow2,top-id=active-disk0,\
+        file.file.filename=/mnt/ramfs/active_disk.img,\
+        file.backing=hidden_disk0,shared-disk=on
+
+Issue qmp command:
+1. { 'execute': 'nbd-server-start',
+     'arguments': {
+        'addr': {
+            'type': 'inet',
+            'data': {
+                'host': '0',
+                'port': 'yyy'
+            }
+        }
+     }
+   }
+2. { 'execute': 'nbd-server-add',
+     'arguments': {
+        'device': 'hidden_disk0',
+        'writable': true
+    }
+  }
+
+After Failover:
+Primary:
+  { 'execute': 'x-blockdev-del',
+    'arguments': {
+        'node-name': 'rep'
+    }
+  }
+
+Secondary:
+  {'execute': 'nbd-server-stop' }
+
 TODO:
 1. Continuous block replication
-2. Shared disk
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options
  2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
@ 2017-01-20  3:47 ` zhanghailiang
  2017-02-27 17:10   ` Stefan Hajnoczi
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: zhanghailiang @ 2017-01-20  3:47 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst, xuquan8,
	zhanghailiang, Eric Blake, Wen Congyang

We use these two options to identify which disk is
shared

Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
v3:
- Move g_free(s->shared_disk_id) to the common fail process place (Stefan)
- Fix comments for these two options
---
 block/replication.c  | 37 +++++++++++++++++++++++++++++++++++++
 qapi/block-core.json | 10 +++++++++-
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/block/replication.c b/block/replication.c
index 729dd12..b96a3e5 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
 typedef struct BDRVReplicationState {
     ReplicationMode mode;
     int replication_state;
+    bool is_shared_disk;
+    char *shared_disk_id;
     BdrvChild *active_disk;
     BdrvChild *hidden_disk;
     BdrvChild *secondary_disk;
+    BdrvChild *primary_disk;
     char *top_id;
     ReplicationState *rs;
     Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool failover,
 
 #define REPLICATION_MODE        "mode"
 #define REPLICATION_TOP_ID      "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
 static QemuOptsList replication_runtime_opts = {
     .name = "replication",
     .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
             .name = REPLICATION_TOP_ID,
             .type = QEMU_OPT_STRING,
         },
+        {
+            .name = REPLICATION_SHARED_DISK_ID,
+            .type = QEMU_OPT_STRING,
+        },
+        {
+            .name = REPLICATION_SHARED_DISK,
+            .type = QEMU_OPT_BOOL,
+        },
         { /* end of list */ }
     },
 };
@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict *options,
     QemuOpts *opts = NULL;
     const char *mode;
     const char *top_id;
+    const char *shared_disk_id;
+    BlockBackend *blk;
+    BlockDriverState *tmp_bs;
 
     ret = -EINVAL;
     opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
@@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict *options,
                    "The option mode's value should be primary or secondary");
         goto fail;
     }
+    s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+                                          false);
+    if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+        shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+        if (!shared_disk_id) {
+            error_setg(&local_err, "Missing shared disk blk option");
+            goto fail;
+        }
+        s->shared_disk_id = g_strdup(shared_disk_id);
+        blk = blk_by_name(s->shared_disk_id);
+        if (!blk) {
+            error_setg(&local_err, "There is no %s block", s->shared_disk_id);
+            goto fail;
+        }
+        /* We can't access root member of BlockBackend directly */
+        tmp_bs = blk_bs(blk);
+        s->primary_disk = QLIST_FIRST(&tmp_bs->parents);
+    }
 
     s->rs = replication_new(bs, &replication_ops);
 
     ret = 0;
 
 fail:
+    g_free(s->shared_disk_id);
     qemu_opts_del(opts);
     error_propagate(errp, local_err);
 
@@ -135,6 +171,7 @@ static void replication_close(BlockDriverState *bs)
 {
     BDRVReplicationState *s = bs->opaque;
 
+    g_free(s->shared_disk_id);
     if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
         replication_stop(s->rs, false, NULL);
     }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1b3e6eb..e9fecda 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2624,12 +2624,20 @@
 #          node who owns the replication node chain. Must not be given in
 #          primary mode.
 #
+# @shared-disk-id: #optional id of shared disk while is replication mode,
+#                  if @shared-disk is true, this option is required (Since: 2.9)
+#
+# @shared-disk: #optional (The default is false) to indicate whether or not
+#               a disk is shared by primary VM and secondary VM. (Since: 2.9)
+#
 # Since: 2.8
 ##
 { 'struct': 'BlockdevOptionsReplication',
   'base': 'BlockdevOptionsGenericFormat',
   'data': { 'mode': 'ReplicationMode',
-            '*top-id': 'str' } }
+            '*top-id': 'str',
+            '*shared-disk-id': 'str',
+            '*shared-disk': 'bool' } }
 
 ##
 # @NFSTransport:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint()
  2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
@ 2017-01-20  3:47 ` zhanghailiang
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: zhanghailiang @ 2017-01-20  3:47 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst, xuquan8,
	zhanghailiang

The helper backup_do_checkpoint() will be used for primary related
codes. Here we split it out from secondary_do_checkpoint().

Besides, it is unnecessary to call backup_do_checkpoint() in
replication starting and normally stop replication path.
We only need call it while do real checkpointing.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 block/replication.c | 36 +++++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index b96a3e5..3a44728 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -332,20 +332,8 @@ static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
 
 static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
 {
-    Error *local_err = NULL;
     int ret;
 
-    if (!s->secondary_disk->bs->job) {
-        error_setg(errp, "Backup job was cancelled unexpectedly");
-        return;
-    }
-
-    backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
     ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
     if (ret < 0) {
         error_setg(errp, "Cannot make active disk empty");
@@ -558,6 +546,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             return;
         }
         block_job_start(job);
+
+        secondary_do_checkpoint(s, errp);
         break;
     default:
         aio_context_release(aio_context);
@@ -566,10 +556,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
 
     s->replication_state = BLOCK_REPLICATION_RUNNING;
 
-    if (s->mode == REPLICATION_MODE_SECONDARY) {
-        secondary_do_checkpoint(s, errp);
-    }
-
     s->error = 0;
     aio_context_release(aio_context);
 }
@@ -579,13 +565,29 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
     BlockDriverState *bs = rs->opaque;
     BDRVReplicationState *s;
     AioContext *aio_context;
+    Error *local_err = NULL;
 
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
     s = bs->opaque;
 
-    if (s->mode == REPLICATION_MODE_SECONDARY) {
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        if (!s->secondary_disk->bs->job) {
+            error_setg(errp, "Backup job was cancelled unexpectedly");
+            break;
+        }
+        backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            break;
+        }
         secondary_do_checkpoint(s, errp);
+        break;
+    default:
+        abort();
     }
     aio_context_release(aio_context);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 4/6] replication: fix code logic with the new shared_disk option
  2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
                   ` (2 preceding siblings ...)
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
@ 2017-01-20  3:47 ` zhanghailiang
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case zhanghailiang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: zhanghailiang @ 2017-01-20  3:47 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst, xuquan8,
	zhanghailiang

Some code logic only be needed in non-shared disk, here
we adjust these codes to prepare for shared disk scenario.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 block/replication.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 3a44728..70ec08c 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -531,21 +531,28 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             aio_context_release(aio_context);
             return;
         }
-        bdrv_op_block_all(top_bs, s->blocker);
-        bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-        job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
-                                0, MIRROR_SYNC_MODE_NONE, NULL, false,
+        /*
+         * Only in the case of non-shared disk,
+         * the backup job is in the secondary side
+         */
+        if (!s->is_shared_disk) {
+            bdrv_op_block_all(top_bs, s->blocker);
+            bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+            job = backup_job_create(NULL, s->secondary_disk->bs,
+                                s->hidden_disk->bs, 0,
+                                MIRROR_SYNC_MODE_NONE, NULL, false,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            backup_job_cleanup(bs);
-            aio_context_release(aio_context);
-            return;
+            if (local_err) {
+                error_propagate(errp, local_err);
+                backup_job_cleanup(bs);
+                aio_context_release(aio_context);
+                return;
+            }
+            block_job_start(job);
         }
-        block_job_start(job);
 
         secondary_do_checkpoint(s, errp);
         break;
@@ -575,14 +582,16 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
     case REPLICATION_MODE_PRIMARY:
         break;
     case REPLICATION_MODE_SECONDARY:
-        if (!s->secondary_disk->bs->job) {
-            error_setg(errp, "Backup job was cancelled unexpectedly");
-            break;
-        }
-        backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            break;
+        if (!s->is_shared_disk) {
+            if (!s->secondary_disk->bs->job) {
+                error_setg(errp, "Backup job was cancelled unexpectedly");
+                break;
+            }
+            backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                break;
+            }
         }
         secondary_do_checkpoint(s, errp);
         break;
@@ -663,7 +672,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * before the BDS is closed, because we will access hidden
          * disk, secondary disk in backup_job_completed().
          */
-        if (s->secondary_disk->bs->job) {
+        if (!s->is_shared_disk && s->secondary_disk->bs->job) {
             block_job_cancel_sync(s->secondary_disk->bs->job);
         }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case
  2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
                   ` (3 preceding siblings ...)
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
@ 2017-01-20  3:47 ` zhanghailiang
  2017-02-27 17:37   ` Stefan Hajnoczi
  2017-01-20  3:48 ` [Qemu-devel] [PATCH v3 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang
  2017-02-23 12:46 ` [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case Hailiang Zhang
  6 siblings, 1 reply; 17+ messages in thread
From: zhanghailiang @ 2017-01-20  3:47 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst, xuquan8,
	zhanghailiang, Wen Congyang

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

         virtio-blk                     ||                               .----------
             /                          ||                               | Secondary
            /                           ||                               '----------
           /                            ||                                 virtio-blk
          /                             ||                                      |
          |                             ||                               replication(5)
          |                    NBD  -------->   NBD   (2)                       |
          |                  client     ||    server ---> hidden disk <-- active disk(4)
          |                     ^       ||                      |
          |              replication(1) ||                      |
          |                     |       ||                      |
          |   +-----------------'       ||                      |
         (3)  |drive-backup sync=none   ||                      |
--------. |   +-----------------+       ||                      |
Primary | |                     |       ||           backing    |
--------' |                     |       ||                      |
          V                     |                               |
       +-------------------------------------------+            |
       |               shared disk                 | <----------+
       +-------------------------------------------+

    1) Primary writes will read original data and forward it to Secondary
       QEMU.
    2) The hidden-disk is created automatically. It buffers the original content
       that is modified by the primary VM. It should also be an empty disk, and
       the driver supports bdrv_make_empty() and backing file.
    3) Primary write requests will be written to Shared disk.
    4) Secondary write requests will be buffered in the active disk and it
       will overwrite the existing sector content in the buffer.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
 block/replication.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 70ec08c..a0b3e41 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -233,7 +233,7 @@ static coroutine_fn int replication_co_readv(BlockDriverState *bs,
                                              QEMUIOVector *qiov)
 {
     BDRVReplicationState *s = bs->opaque;
-    BdrvChild *child = s->secondary_disk;
+    BdrvChild *child = s->is_shared_disk ? s->primary_disk : s->secondary_disk;
     BlockJob *job = NULL;
     CowRequest req;
     int ret;
@@ -415,7 +415,12 @@ static void backup_job_completed(void *opaque, int ret)
         s->error = -EIO;
     }
 
-    backup_job_cleanup(bs);
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        s->replication_state = BLOCK_REPLICATION_DONE;
+        s->error = 0;
+    } else {
+        backup_job_cleanup(bs);
+    }
 }
 
 static bool check_top_bs(BlockDriverState *top_bs, BlockDriverState *bs)
@@ -467,6 +472,19 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
 
     switch (s->mode) {
     case REPLICATION_MODE_PRIMARY:
+        if (s->is_shared_disk) {
+            job = backup_job_create(NULL, s->primary_disk->bs, bs, 0,
+                MIRROR_SYNC_MODE_NONE, NULL, false, BLOCKDEV_ON_ERROR_REPORT,
+                BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
+                backup_job_completed, bs, NULL, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                backup_job_cleanup(bs);
+                aio_context_release(aio_context);
+                return;
+            }
+            block_job_start(job);
+        }
         break;
     case REPLICATION_MODE_SECONDARY:
         s->active_disk = bs->file;
@@ -485,7 +503,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         }
 
         s->secondary_disk = s->hidden_disk->bs->backing;
-        if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) {
+        if (!s->secondary_disk->bs ||
+            (!s->is_shared_disk && !bdrv_has_blk(s->secondary_disk->bs))) {
             error_setg(errp, "The secondary disk doesn't have block backend");
             aio_context_release(aio_context);
             return;
@@ -580,11 +599,24 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
 
     switch (s->mode) {
     case REPLICATION_MODE_PRIMARY:
+        if (s->is_shared_disk) {
+            if (!s->primary_disk->bs->job) {
+                error_setg(errp, "Primary backup job was cancelled"
+                           " unexpectedly");
+                break;
+            }
+
+            backup_do_checkpoint(s->primary_disk->bs->job, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+            }
+        }
         break;
     case REPLICATION_MODE_SECONDARY:
         if (!s->is_shared_disk) {
             if (!s->secondary_disk->bs->job) {
-                error_setg(errp, "Backup job was cancelled unexpectedly");
+                error_setg(errp, "Secondary backup job was cancelled"
+                           " unexpectedly");
                 break;
             }
             backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
@@ -663,8 +695,12 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
 
     switch (s->mode) {
     case REPLICATION_MODE_PRIMARY:
-        s->replication_state = BLOCK_REPLICATION_DONE;
-        s->error = 0;
+        if (s->is_shared_disk && s->primary_disk->bs->job) {
+            block_job_cancel(s->primary_disk->bs->job);
+        } else {
+            s->replication_state = BLOCK_REPLICATION_DONE;
+            s->error = 0;
+        }
         break;
     case REPLICATION_MODE_SECONDARY:
         /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver
  2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
                   ` (4 preceding siblings ...)
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case zhanghailiang
@ 2017-01-20  3:48 ` zhanghailiang
  2017-02-23 12:46 ` [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case Hailiang Zhang
  6 siblings, 0 replies; 17+ messages in thread
From: zhanghailiang @ 2017-01-20  3:48 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst, xuquan8,
	zhanghailiang, Eric Blake, Wen Congyang

Without this callback, there will be an error reports in the primary side:
"qemu-system-x86_64: Couldn't determine the cluster size of the target image,
which has no backing file: Operation not supported
Aborting, since this may create an unusable destination image"

For nbd driver, it doesn't have cluster size, so here we return
a fake value for it.

This patch should be dropped if Eric's nbd patch be merged.
https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03567.html

Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 block/nbd.c         | 12 ++++++++++++
 block/replication.c |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 35f24be..b71a13d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -43,6 +43,8 @@
 
 #define EN_OPTSTR ":exportname="
 
+#define NBD_FAKE_CLUSTER_SIZE 512
+
 typedef struct BDRVNBDState {
     NBDClientSession client;
 
@@ -552,6 +554,13 @@ static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
     bs->full_open_options = opts;
 }
 
+static int nbd_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    bdi->cluster_size  = NBD_FAKE_CLUSTER_SIZE;
+
+    return 0;
+}
+
 static BlockDriver bdrv_nbd = {
     .format_name                = "nbd",
     .protocol_name              = "nbd",
@@ -569,6 +578,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_get_info              = nbd_get_info,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -588,6 +598,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_get_info              = nbd_get_info,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -607,6 +618,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_get_info              = nbd_get_info,
 };
 
 static void bdrv_nbd_init(void)
diff --git a/block/replication.c b/block/replication.c
index a0b3e41..e4f8069 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -731,6 +731,11 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
     aio_context_release(aio_context);
 }
 
+static int replication_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    return bdrv_get_info(bs->file->bs, bdi);
+}
+
 BlockDriver bdrv_replication = {
     .format_name                = "replication",
     .protocol_name              = "replication",
@@ -743,6 +748,7 @@ BlockDriver bdrv_replication = {
     .bdrv_co_readv              = replication_co_readv,
     .bdrv_co_writev             = replication_co_writev,
 
+    .bdrv_get_info              = replication_get_info,
     .is_filter                  = true,
     .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case
  2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
                   ` (5 preceding siblings ...)
  2017-01-20  3:48 ` [Qemu-devel] [PATCH v3 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang
@ 2017-02-23 12:46 ` Hailiang Zhang
  6 siblings, 0 replies; 17+ messages in thread
From: Hailiang Zhang @ 2017-02-23 12:46 UTC (permalink / raw)
  To: qemu-devel, stefanha
  Cc: xuquan8, qemu-block, kwolf, mreitz, xiecl.fnst, zhangchen.fnst,
	Juan Quintela, Amit Shah, Dr . David Alan Gilbert, eddie.dong

ping ... ?

On 2017/1/20 11:47, zhanghailiang wrote:
> COLO block replication doesn't support the shared disk case,
> Here we try to implement it and this is the third version.
>
> Last posted series patches:
> https://lists.gnu.org/archive/html/qemu-block/2016-12/msg00039.html
> You can refer to the above link if want to test it.
>
> I have uploaded the new version to github:
> https://github.com/coloft/qemu/tree/colo-developing-with-shared-disk-2016-1-20
>
> Please review and any commits are welcomed.
>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> Cc: Dr. David Alan Gilbert (git) <dgilbert@redhat.com>
> Cc: eddie.dong@intel.com
>
> v3:
> - Fix some comments from Stefan and Eric
>
> v2:
> - Drop the patch which add a blk_root() helper
> - Fix some comments from Changlong
>
> zhanghailiang (6):
>    docs/block-replication: Add description for shared-disk case
>    replication: add shared-disk and shared-disk-id options
>    replication: Split out backup_do_checkpoint() from
>      secondary_do_checkpoint()
>    replication: fix code logic with the new shared_disk option
>    replication: Implement block replication for shared disk case
>    nbd/replication: implement .bdrv_get_info() for nbd and replication
>      driver
>
>   block/nbd.c                |  12 ++++
>   block/replication.c        | 156 +++++++++++++++++++++++++++++++++++----------
>   docs/block-replication.txt | 139 ++++++++++++++++++++++++++++++++++++++--
>   qapi/block-core.json       |  10 ++-
>   4 files changed, 279 insertions(+), 38 deletions(-)
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
@ 2017-02-27 16:46   ` Stefan Hajnoczi
  2017-03-01  3:09     ` Hailiang Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2017-02-27 16:46 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, xiecl.fnst,
	zhangchen.fnst, xuquan8, Wen Congyang

[-- Attachment #1: Type: text/plain, Size: 552 bytes --]

On Fri, Jan 20, 2017 at 11:47:55AM +0800, zhanghailiang wrote:
> +Secondary:
> + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
> +        backing.driver=raw,backing.file.filename=1.raw \
> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> +        file.driver=qcow2,top-id=active-disk0,\
> +        file.file.filename=/mnt/ramfs/active_disk.img,\
> +        file.backing=hidden_disk0,shared-disk=on

I don't see where this patch series adds a -drive shared-disk=on option.
Did I miss it?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
@ 2017-02-27 17:10   ` Stefan Hajnoczi
  2017-03-01  3:53     ` Hailiang Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2017-02-27 17:10 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, xiecl.fnst,
	zhangchen.fnst, xuquan8, Eric Blake, Wen Congyang

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On Fri, Jan 20, 2017 at 11:47:56AM +0800, zhanghailiang wrote:
> @@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict *options,
>                     "The option mode's value should be primary or secondary");
>          goto fail;
>      }
> +    s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
> +                                          false);
> +    if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
> +        shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
> +        if (!shared_disk_id) {
> +            error_setg(&local_err, "Missing shared disk blk option");
> +            goto fail;
> +        }
> +        s->shared_disk_id = g_strdup(shared_disk_id);
> +        blk = blk_by_name(s->shared_disk_id);
> +        if (!blk) {
> +            error_setg(&local_err, "There is no %s block", s->shared_disk_id);
> +            goto fail;
> +        }
> +        /* We can't access root member of BlockBackend directly */
> +        tmp_bs = blk_bs(blk);
> +        s->primary_disk = QLIST_FIRST(&tmp_bs->parents);

Why is this necessary?

We already have the BlockBackend for the primary disk.  I'm not sure why
the BdrvChild is needed.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case
  2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case zhanghailiang
@ 2017-02-27 17:37   ` Stefan Hajnoczi
  2017-03-07 14:30     ` Hailiang Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2017-02-27 17:37 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, xiecl.fnst,
	zhangchen.fnst, xuquan8, Wen Congyang

[-- Attachment #1: Type: text/plain, Size: 2929 bytes --]

On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote:
> Just as the scenario of non-shared disk block replication,
> we are going to implement block replication from many basic
> blocks that are already in QEMU.
> The architecture is:
> 
>          virtio-blk                     ||                               .----------
>              /                          ||                               | Secondary
>             /                           ||                               '----------
>            /                            ||                                 virtio-blk
>           /                             ||                                      |
>           |                             ||                               replication(5)
>           |                    NBD  -------->   NBD   (2)                       |
>           |                  client     ||    server ---> hidden disk <-- active disk(4)
>           |                     ^       ||                      |
>           |              replication(1) ||                      |
>           |                     |       ||                      |
>           |   +-----------------'       ||                      |
>          (3)  |drive-backup sync=none   ||                      |
> --------. |   +-----------------+       ||                      |
> Primary | |                     |       ||           backing    |
> --------' |                     |       ||                      |
>           V                     |                               |
>        +-------------------------------------------+            |
>        |               shared disk                 | <----------+
>        +-------------------------------------------+
> 
>     1) Primary writes will read original data and forward it to Secondary
>        QEMU.
>     2) The hidden-disk is created automatically. It buffers the original content
>        that is modified by the primary VM. It should also be an empty disk, and
>        the driver supports bdrv_make_empty() and backing file.
>     3) Primary write requests will be written to Shared disk.
>     4) Secondary write requests will be buffered in the active disk and it
>        will overwrite the existing sector content in the buffer.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

Are there any restrictions on the shared disk?  For example the -drive
cache= mode must be 'none'.  If the cache mode isn't 'none' the
secondary host might have old data in the host page cache.  The
Secondary QEMU would have an inconsistent view of the shared disk.

Are image file formats like qcow2 supported for the shared disk?  Extra
steps are required to achieve consistency, see bdrv_invalidate_cache().

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case
  2017-02-27 16:46   ` Stefan Hajnoczi
@ 2017-03-01  3:09     ` Hailiang Zhang
  2017-03-01 10:22       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: Hailiang Zhang @ 2017-03-01  3:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, xiecl.fnst,
	zhangchen.fnst, Wen Congyang

On 2017/2/28 0:46, Stefan Hajnoczi wrote:
> On Fri, Jan 20, 2017 at 11:47:55AM +0800, zhanghailiang wrote:
>> +Secondary:
>> + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
>> +        backing.driver=raw,backing.file.filename=1.raw \
>> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
>> +        file.driver=qcow2,top-id=active-disk0,\
>> +        file.file.filename=/mnt/ramfs/active_disk.img,\
>> +        file.backing=hidden_disk0,shared-disk=on
>
> I don't see where this patch series adds a -drive shared-disk=on option.
> Did I miss it?
>

In patch 2, we add this option.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options
  2017-02-27 17:10   ` Stefan Hajnoczi
@ 2017-03-01  3:53     ` Hailiang Zhang
  2017-03-01 10:21       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: Hailiang Zhang @ 2017-03-01  3:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, xiecl.fnst,
	zhangchen.fnst, Eric Blake, Wen Congyang

On 2017/2/28 1:10, Stefan Hajnoczi wrote:
> On Fri, Jan 20, 2017 at 11:47:56AM +0800, zhanghailiang wrote:
>> @@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict *options,
>>                      "The option mode's value should be primary or secondary");
>>           goto fail;
>>       }
>> +    s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
>> +                                          false);
>> +    if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
>> +        shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
>> +        if (!shared_disk_id) {
>> +            error_setg(&local_err, "Missing shared disk blk option");
>> +            goto fail;
>> +        }
>> +        s->shared_disk_id = g_strdup(shared_disk_id);
>> +        blk = blk_by_name(s->shared_disk_id);
>> +        if (!blk) {
>> +            error_setg(&local_err, "There is no %s block", s->shared_disk_id);
>> +            goto fail;
>> +        }
>> +        /* We can't access root member of BlockBackend directly */
>> +        tmp_bs = blk_bs(blk);
>> +        s->primary_disk = QLIST_FIRST(&tmp_bs->parents);
>
> Why is this necessary?
>
> We already have the BlockBackend for the primary disk.  I'm not sure why
> the BdrvChild is needed.
>

Er, the helper backup_job_create() needs the BlockDriverState for the primary disk,
besides, we want to make it same with 'secondary_disk' in struct BDRVReplicationState.

Thanks.
Hailiang

> Stefan
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options
  2017-03-01  3:53     ` Hailiang Zhang
@ 2017-03-01 10:21       ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2017-03-01 10:21 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Stefan Hajnoczi, kwolf, xuquan8, xiecl.fnst, Wen Congyang,
	zhangchen.fnst, qemu-block, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]

On Wed, Mar 01, 2017 at 11:53:14AM +0800, Hailiang Zhang wrote:
> On 2017/2/28 1:10, Stefan Hajnoczi wrote:
> > On Fri, Jan 20, 2017 at 11:47:56AM +0800, zhanghailiang wrote:
> > > @@ -119,12 +136,31 @@ static int replication_open(BlockDriverState *bs, QDict *options,
> > >                      "The option mode's value should be primary or secondary");
> > >           goto fail;
> > >       }
> > > +    s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
> > > +                                          false);
> > > +    if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
> > > +        shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
> > > +        if (!shared_disk_id) {
> > > +            error_setg(&local_err, "Missing shared disk blk option");
> > > +            goto fail;
> > > +        }
> > > +        s->shared_disk_id = g_strdup(shared_disk_id);
> > > +        blk = blk_by_name(s->shared_disk_id);
> > > +        if (!blk) {
> > > +            error_setg(&local_err, "There is no %s block", s->shared_disk_id);
> > > +            goto fail;
> > > +        }
> > > +        /* We can't access root member of BlockBackend directly */
> > > +        tmp_bs = blk_bs(blk);
> > > +        s->primary_disk = QLIST_FIRST(&tmp_bs->parents);
> > 
> > Why is this necessary?
> > 
> > We already have the BlockBackend for the primary disk.  I'm not sure why
> > the BdrvChild is needed.
> > 
> 
> Er, the helper backup_job_create() needs the BlockDriverState for the primary disk,

You can get the BDS any time using blk_bs().

> besides, we want to make it same with 'secondary_disk' in struct BDRVReplicationState.

Block drivers are allowed to access BDS->parents so it's not necessarily
a problem that the code does this.  But it's rare and usually means
there is a non-standard relationship between BDS nodes.

It wasn't obvious from the code why you are going through the trouble of
getting a BdrvChild instead of using the BlockBackend (the normal way of
accessing a drive).

Perhaps a comment would help:

/* We have a BlockBackend for the primary disk but use BdrvChild for
 * consistency - active_disk, secondary_disk, etc are also BdrvChild.
 */

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case
  2017-03-01  3:09     ` Hailiang Zhang
@ 2017-03-01 10:22       ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2017-03-01 10:22 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Stefan Hajnoczi, kwolf, xuquan8, xiecl.fnst, Wen Congyang,
	zhangchen.fnst, qemu-block, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Wed, Mar 01, 2017 at 11:09:11AM +0800, Hailiang Zhang wrote:
> On 2017/2/28 0:46, Stefan Hajnoczi wrote:
> > On Fri, Jan 20, 2017 at 11:47:55AM +0800, zhanghailiang wrote:
> > > +Secondary:
> > > + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
> > > +        backing.driver=raw,backing.file.filename=1.raw \
> > > + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> > > +        file.driver=qcow2,top-id=active-disk0,\
> > > +        file.file.filename=/mnt/ramfs/active_disk.img,\
> > > +        file.backing=hidden_disk0,shared-disk=on
> > 
> > I don't see where this patch series adds a -drive shared-disk=on option.
> > Did I miss it?
> > 
> 
> In patch 2, we add this option.

Of course!  For some reason I thought it was a new global -drive option.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case
  2017-02-27 17:37   ` Stefan Hajnoczi
@ 2017-03-07 14:30     ` Hailiang Zhang
  2017-03-10  4:17       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: Hailiang Zhang @ 2017-03-07 14:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, xiecl.fnst,
	zhangchen.fnst, wencongyang2

Hi Stefan,

Sorry for the delayed reply.

On 2017/2/28 1:37, Stefan Hajnoczi wrote:
> On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote:
>> Just as the scenario of non-shared disk block replication,
>> we are going to implement block replication from many basic
>> blocks that are already in QEMU.
>> The architecture is:
>>
>>           virtio-blk                     ||                               .----------
>>               /                          ||                               | Secondary
>>              /                           ||                               '----------
>>             /                            ||                                 virtio-blk
>>            /                             ||                                      |
>>            |                             ||                               replication(5)
>>            |                    NBD  -------->   NBD   (2)                       |
>>            |                  client     ||    server ---> hidden disk <-- active disk(4)
>>            |                     ^       ||                      |
>>            |              replication(1) ||                      |
>>            |                     |       ||                      |
>>            |   +-----------------'       ||                      |
>>           (3)  |drive-backup sync=none   ||                      |
>> --------. |   +-----------------+       ||                      |
>> Primary | |                     |       ||           backing    |
>> --------' |                     |       ||                      |
>>            V                     |                               |
>>         +-------------------------------------------+            |
>>         |               shared disk                 | <----------+
>>         +-------------------------------------------+
>>
>>      1) Primary writes will read original data and forward it to Secondary
>>         QEMU.
>>      2) The hidden-disk is created automatically. It buffers the original content
>>         that is modified by the primary VM. It should also be an empty disk, and
>>         the driver supports bdrv_make_empty() and backing file.
>>      3) Primary write requests will be written to Shared disk.
>>      4) Secondary write requests will be buffered in the active disk and it
>>         will overwrite the existing sector content in the buffer.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
>
> Are there any restrictions on the shared disk?  For example the -drive
> cache= mode must be 'none'.  If the cache mode isn't 'none' the
> secondary host might have old data in the host page cache.  The

While do checkpoint, we will call vm_stop(), in which, the bdrv_flush_all()
will be called, is it enough ?

> Secondary QEMU would have an inconsistent view of the shared disk.
>
> Are image file formats like qcow2 supported for the shared disk?  Extra

In the above scenario, it has no limitation of formats for the shared disk.

> steps are required to achieve consistency, see bdrv_invalidate_cache().
>

Hmm, in that case, we should call bdrv_invalidate_cache_all() while checkpoint.


Thanks,
Hailiang

> Stefan
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v3 5/6] replication: Implement block replication for shared disk case
  2017-03-07 14:30     ` Hailiang Zhang
@ 2017-03-10  4:17       ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2017-03-10  4:17 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Stefan Hajnoczi, kwolf, xuquan8, xiecl.fnst, zhangchen.fnst,
	qemu-block, wencongyang2, qemu-devel, mreitz

[-- Attachment #1: Type: text/plain, Size: 3816 bytes --]

On Tue, Mar 07, 2017 at 10:30:30PM +0800, Hailiang Zhang wrote:
> Hi Stefan,
> 
> Sorry for the delayed reply.
> 
> On 2017/2/28 1:37, Stefan Hajnoczi wrote:
> > On Fri, Jan 20, 2017 at 11:47:59AM +0800, zhanghailiang wrote:
> > > Just as the scenario of non-shared disk block replication,
> > > we are going to implement block replication from many basic
> > > blocks that are already in QEMU.
> > > The architecture is:
> > > 
> > >           virtio-blk                     ||                               .----------
> > >               /                          ||                               | Secondary
> > >              /                           ||                               '----------
> > >             /                            ||                                 virtio-blk
> > >            /                             ||                                      |
> > >            |                             ||                               replication(5)
> > >            |                    NBD  -------->   NBD   (2)                       |
> > >            |                  client     ||    server ---> hidden disk <-- active disk(4)
> > >            |                     ^       ||                      |
> > >            |              replication(1) ||                      |
> > >            |                     |       ||                      |
> > >            |   +-----------------'       ||                      |
> > >           (3)  |drive-backup sync=none   ||                      |
> > > --------. |   +-----------------+       ||                      |
> > > Primary | |                     |       ||           backing    |
> > > --------' |                     |       ||                      |
> > >            V                     |                               |
> > >         +-------------------------------------------+            |
> > >         |               shared disk                 | <----------+
> > >         +-------------------------------------------+
> > > 
> > >      1) Primary writes will read original data and forward it to Secondary
> > >         QEMU.
> > >      2) The hidden-disk is created automatically. It buffers the original content
> > >         that is modified by the primary VM. It should also be an empty disk, and
> > >         the driver supports bdrv_make_empty() and backing file.
> > >      3) Primary write requests will be written to Shared disk.
> > >      4) Secondary write requests will be buffered in the active disk and it
> > >         will overwrite the existing sector content in the buffer.
> > > 
> > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> > > Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> > > Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
> > 
> > Are there any restrictions on the shared disk?  For example the -drive
> > cache= mode must be 'none'.  If the cache mode isn't 'none' the
> > secondary host might have old data in the host page cache.  The
> 
> While do checkpoint, we will call vm_stop(), in which, the bdrv_flush_all()
> will be called, is it enough ?
> 
> > Secondary QEMU would have an inconsistent view of the shared disk.
> > 
> > Are image file formats like qcow2 supported for the shared disk?  Extra
> 
> In the above scenario, it has no limitation of formats for the shared disk.
> 
> > steps are required to achieve consistency, see bdrv_invalidate_cache().
> > 
> 
> Hmm, in that case, we should call bdrv_invalidate_cache_all() while checkpoint.

Yes, it's not enough to just call bdrv_drain_all()/bdrv_flush_all().
The Secondary may need to reread metadata that is loaded in memory (e.g.
qcow2's L2 table cache) so bdrv_invalidate_cache() is needed.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-03-10  4:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-20  3:47 [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case zhanghailiang
2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
2017-02-27 16:46   ` Stefan Hajnoczi
2017-03-01  3:09     ` Hailiang Zhang
2017-03-01 10:22       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
2017-02-27 17:10   ` Stefan Hajnoczi
2017-03-01  3:53     ` Hailiang Zhang
2017-03-01 10:21       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
2017-01-20  3:47 ` [Qemu-devel] [PATCH v3 5/6] replication: Implement block replication for shared disk case zhanghailiang
2017-02-27 17:37   ` Stefan Hajnoczi
2017-03-07 14:30     ` Hailiang Zhang
2017-03-10  4:17       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-01-20  3:48 ` [Qemu-devel] [PATCH v3 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang
2017-02-23 12:46 ` [Qemu-devel] [PATCH v3 0/6] COLO block replication supports shared disk case Hailiang Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.