All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case
@ 2016-12-05  8:34 zhanghailiang
  2016-12-05  8:34 ` [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: zhanghailiang @ 2016-12-05  8:34 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, xiecl.fnst,
	zhanghailiang, Juan Quintela, Amit Shah, Dr . David Alan Gilbert,
	eddie.dong

COLO block replication doesn't support the shared disk case,
Here we try to implement it.

For the detail of shared-disk scenario, please refer to patch 1.

COLO codes with shared-disk block replication can be found from the link:
https://github.com/coloft/qemu/tree/colo-developing-with-shared-disk-2016-12-5

Test procedures:
1. Secondary:
# x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :9 -name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,backing.driver=raw,backing.file.filename=/work/kvm/suse11_sp3_64  -drive if=ide,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfs/active_disk.img,file.backing=hidden_disk0,shared-disk=on -incoming tcp:0:9999

Issue qmp commands:
{'execute':'qmp_capabilities'}
{'execute': 'nbd-server-start', 'arguments': {'addr': {'type': 'inet', 'data': {'host': '0', 'port': '9998'} } } }
{'execute': 'nbd-server-add', 'arguments': {'device': 'hidden_disk0', 'writable': true } }

2.Primary:
# x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 2048 -smp 2 -qmp stdio -vnc :9 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -drive if=virtio,id=primary_disk0,file.filename=/work/kvm/suse11_sp3_64,driver=raw -S

Issue qmp commands:
{'execute':'qmp_capabilities'}
{'execute': 'human-monitor-command', 'arguments': {'command-line': 'drive_add -n buddy driver=replication,mode=primary,file.driver=nbd,file.host=9.42.3.17,file.port=9998,file.export=hidden_disk0,shared-disk-id=primary_disk0,shared-disk=on,node-name=rep'}}
{'execute': 'migrate-set-capabilities', 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } ] } }
{'execute': 'migrate', 'arguments': {'uri': 'tcp:9.42.3.17:9999' } }

3. Failover
Secondary side:
Issue qmp commands:
{ 'execute': 'nbd-server-stop' }
{ "execute": "x-colo-lost-heartbeat" }

Please review and any commits are welcomed.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com> 
Cc: Dr. David Alan Gilbert (git) <dgilbert@redhat.com>
Cc: eddie.dong@intel.com

v2:
- Drop the patch which add a blk_root() helper
- Fix some comments from Changlong

zhanghailiang (6):
  docs/block-replication: Add description for shared-disk case
  replication: add shared-disk and shared-disk-id options
  replication: Split out backup_do_checkpoint() from
    secondary_do_checkpoint()
  replication: fix code logic with the new shared_disk option
  replication: Implement block replication for shared disk case
  nbd/replication: implement .bdrv_get_info() for nbd and replication
    driver

 block/nbd.c                |  12 ++++
 block/replication.c        | 156 +++++++++++++++++++++++++++++++++++----------
 docs/block-replication.txt | 139 ++++++++++++++++++++++++++++++++++++++--
 qapi/block-core.json       |   9 ++-
 4 files changed, 278 insertions(+), 38 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
  2016-12-05  8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
@ 2016-12-05  8:34 ` zhanghailiang
  2016-12-20 11:23   ` Changlong Xie
  2017-01-13 13:41   ` Stefan Hajnoczi
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: zhanghailiang @ 2016-12-05  8:34 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, xiecl.fnst,
	zhanghailiang, Zhang Chen

Introuduce the scenario of shared-disk block replication
and how to use it.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
v2:
- fix some problems found by Changlong
---
 docs/block-replication.txt | 139 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 135 insertions(+), 4 deletions(-)

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 6bde673..fbfe005 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation
 effort during a vmstate checkpoint, the disk modification operations of
 the Primary disk are asynchronously forwarded to the Secondary node.
 
-== Workflow ==
+== Non-shared disk workflow ==
 The following is the image of block replication workflow:
 
         +----------------------+            +------------------------+
@@ -57,7 +57,7 @@ The following is the image of block replication workflow:
     4) Secondary write requests will be buffered in the Disk buffer and it
        will overwrite the existing sector content in the buffer.
 
-== Architecture ==
+== Non-shared disk architecture ==
 We are going to implement block replication from many basic
 blocks that are already in QEMU.
 
@@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through
 of the NBD server into the secondary disk. So before block replication,
 the primary disk and secondary disk should contain the same data.
 
+== Shared Disk Mode Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  | (2)Forward and write through |             |
+                  | +--------------------------> | Disk Buffer |
+                  | |                            |             |
+                  | |                            \-------------/
+                  | |(1)read                           |
+                  | |                                  |
+       (3)write   | |                                  | backing file
+                  V |                                  |
+                 +-----------------------------+       |
+                 | Shared Disk                 | <-----+
+                 +-----------------------------+
+
+    1) Primary writes will read original data and forward it to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Shared disk, the
+       original sector content will be read from Shared disk and
+       forwarded and buffered in the Disk buffer on the secondary site,
+       but it will not overwrite the existing sector content (it could be
+       from either "Secondary Write Requests" or previous COW of "Primary
+       Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Shared disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Shared Disk Mode Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+         virtio-blk                     ||                               .----------
+             /                          ||                               | Secondary
+            /                           ||                               '----------
+           /                            ||                                 virtio-blk
+          /                             ||                                      |
+          |                             ||                               replication(5)
+          |                    NBD  -------->   NBD   (2)                       |
+          |                  client     ||    server ---> hidden disk <-- active disk(4)
+          |                     ^       ||                      |
+          |              replication(1) ||                      |
+          |                     |       ||                      |
+          |   +-----------------'       ||                      |
+         (3)  |drive-backup sync=none   ||                      |
+--------. |   +-----------------+       ||                      |
+Primary | |                     |       ||           backing    |
+--------' |                     |       ||                      |
+          V                     |                               |
+       +-------------------------------------------+            |
+       |               shared disk                 | <----------+
+       +-------------------------------------------+
+
+
+    1) Primary writes will read original data and forward it to Secondary
+       QEMU.
+    2) The hidden-disk buffers the original content that is modified by the
+       primary VM. It should also be an empty disk, and the driver supports
+       bdrv_make_empty() and backing file.
+    3) Primary write requests will be written to Shared disk.
+    4) Secondary write requests will be buffered in the active disk and it
+       will overwrite the existing sector content in the buffer.
+
 == Failure Handling ==
 There are 7 internal errors when block replication is running:
 1. I/O error on primary disk
@@ -145,7 +213,7 @@ d. replication_stop_all()
    things except failover. The caller must hold the I/O mutex lock if it is
    in migration/checkpoint thread.
 
-== Usage ==
+== Non-shared disk usage ==
 Primary:
   -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
          children.0.file.filename=1.raw,\
@@ -234,6 +302,69 @@ Secondary:
   The primary host is down, so we should do the following thing:
   { 'execute': 'nbd-server-stop' }
 
+== Shared disk usage ==
+Primary:
+ -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
+
+Issue qmp command:
+  { 'execute': 'blockdev-add',
+    'arguments': {
+        'driver': 'replication',
+        'node-name': 'rep',
+        'mode': 'primary',
+        'shared-disk-id': 'primary_disk0',
+        'shared-disk': true,
+        'file': {
+            'driver': 'nbd',
+            'export': 'hidden_disk0',
+            'server': {
+                'type': 'inet',
+                'data': {
+                    'host': 'xxx.xxx.xxx.xxx',
+                    'port': 'yyy'
+                }
+            }
+        }
+     }
+  }
+
+Secondary:
+ -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
+        backing.driver=raw,backing.file.filename=1.raw \
+ -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
+        file.driver=qcow2,top-id=active-disk0,\
+        file.file.filename=/mnt/ramfs/active_disk.img,\
+        file.backing=hidden_disk0,shared-disk=on
+
+Issue qmp command:
+1. { 'execute': 'nbd-server-start',
+     'arguments': {
+        'addr': {
+            'type': 'inet',
+            'data': {
+                'host': '0',
+                'port': 'yyy'
+            }
+        }
+     }
+   }
+2. { 'execute': 'nbd-server-add',
+     'arguments': {
+        'device': 'hidden_disk0',
+        'writable': true
+    }
+  }
+
+After Failover:
+Primary:
+  { 'execute': 'x-blockdev-del',
+    'arguments': {
+        'node-name': 'rep'
+    }
+  }
+
+Secondary:
+  {'execute': 'nbd-server-stop' }
+
 TODO:
 1. Continuous block replication
-2. Shared disk
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options
  2016-12-05  8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
  2016-12-05  8:34 ` [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
@ 2016-12-05  8:35 ` zhanghailiang
  2016-12-05 16:22   ` Eric Blake
                     ` (2 more replies)
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 24+ messages in thread
From: zhanghailiang @ 2016-12-05  8:35 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, xiecl.fnst,
	zhanghailiang, Zhang Chen

We use these two options to identify which disk is
shared

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
v2:
- add these two options for BlockdevOptionsReplication to support
  qmp blockdev-add command.
- fix a memory leak found by Changlong
---
 block/replication.c  | 37 +++++++++++++++++++++++++++++++++++++
 qapi/block-core.json |  9 ++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/block/replication.c b/block/replication.c
index 729dd12..e87ae87 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -25,9 +25,12 @@
 typedef struct BDRVReplicationState {
     ReplicationMode mode;
     int replication_state;
+    bool is_shared_disk;
+    char *shared_disk_id;
     BdrvChild *active_disk;
     BdrvChild *hidden_disk;
     BdrvChild *secondary_disk;
+    BdrvChild *primary_disk;
     char *top_id;
     ReplicationState *rs;
     Error *blocker;
@@ -53,6 +56,9 @@ static void replication_stop(ReplicationState *rs, bool failover,
 
 #define REPLICATION_MODE        "mode"
 #define REPLICATION_TOP_ID      "top-id"
+#define REPLICATION_SHARED_DISK "shared-disk"
+#define REPLICATION_SHARED_DISK_ID "shared-disk-id"
+
 static QemuOptsList replication_runtime_opts = {
     .name = "replication",
     .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
@@ -65,6 +71,14 @@ static QemuOptsList replication_runtime_opts = {
             .name = REPLICATION_TOP_ID,
             .type = QEMU_OPT_STRING,
         },
+        {
+            .name = REPLICATION_SHARED_DISK_ID,
+            .type = QEMU_OPT_STRING,
+        },
+        {
+            .name = REPLICATION_SHARED_DISK,
+            .type = QEMU_OPT_BOOL,
+        },
         { /* end of list */ }
     },
 };
@@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict *options,
     QemuOpts *opts = NULL;
     const char *mode;
     const char *top_id;
+    const char *shared_disk_id;
+    BlockBackend *blk;
+    BlockDriverState *tmp_bs;
 
     ret = -EINVAL;
     opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
@@ -119,6 +136,25 @@ static int replication_open(BlockDriverState *bs, QDict *options,
                    "The option mode's value should be primary or secondary");
         goto fail;
     }
+    s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
+                                          false);
+    if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
+        shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
+        if (!shared_disk_id) {
+            error_setg(&local_err, "Missing shared disk blk");
+            goto fail;
+        }
+        s->shared_disk_id = g_strdup(shared_disk_id);
+        blk = blk_by_name(s->shared_disk_id);
+        if (!blk) {
+            g_free(s->shared_disk_id);
+            error_setg(&local_err, "There is no %s block", s->shared_disk_id);
+            goto fail;
+        }
+        /* We can't access root member of BlockBackend directly */
+        tmp_bs = blk_bs(blk);
+        s->primary_disk = QLIST_FIRST(&tmp_bs->parents);
+    }
 
     s->rs = replication_new(bs, &replication_ops);
 
@@ -135,6 +171,7 @@ static void replication_close(BlockDriverState *bs)
 {
     BDRVReplicationState *s = bs->opaque;
 
+    g_free(s->shared_disk_id);
     if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
         replication_stop(s->rs, false, NULL);
     }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c29bef7..52d7e0d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2232,12 +2232,19 @@
 #          node who owns the replication node chain. Must not be given in
 #          primary mode.
 #
+# @shared-disk-id: #optional The id of shared disk while in replication mode.
+#
+# @shared-disk: #optional To indicate whether or not a disk is shared by
+#               primary VM and secondary VM.
+#
 # Since: 2.8
 ##
 { 'struct': 'BlockdevOptionsReplication',
   'base': 'BlockdevOptionsGenericFormat',
   'data': { 'mode': 'ReplicationMode',
-            '*top-id': 'str' } }
+            '*top-id': 'str',
+            '*shared-disk-id': 'str',
+            '*shared-disk': 'bool' } }
 
 ##
 # @NFSTransport
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint()
  2016-12-05  8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
  2016-12-05  8:34 ` [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
@ 2016-12-05  8:35 ` zhanghailiang
  2016-12-20 12:41   ` Changlong Xie
  2017-01-17 13:10   ` Stefan Hajnoczi
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: zhanghailiang @ 2016-12-05  8:35 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, xiecl.fnst, zhanghailiang

The helper backup_do_checkpoint() will be used for primary related
codes. Here we split it out from secondary_do_checkpoint().

Besides, it is unnecessary to call backup_do_checkpoint() in
replication starting and normally stop replication path.
We only need call it while do real checkpointing.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 block/replication.c | 36 +++++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index e87ae87..35e9ab3 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -332,20 +332,8 @@ static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
 
 static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
 {
-    Error *local_err = NULL;
     int ret;
 
-    if (!s->secondary_disk->bs->job) {
-        error_setg(errp, "Backup job was cancelled unexpectedly");
-        return;
-    }
-
-    backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
     ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
     if (ret < 0) {
         error_setg(errp, "Cannot make active disk empty");
@@ -558,6 +546,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             return;
         }
         block_job_start(job);
+
+        secondary_do_checkpoint(s, errp);
         break;
     default:
         aio_context_release(aio_context);
@@ -566,10 +556,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
 
     s->replication_state = BLOCK_REPLICATION_RUNNING;
 
-    if (s->mode == REPLICATION_MODE_SECONDARY) {
-        secondary_do_checkpoint(s, errp);
-    }
-
     s->error = 0;
     aio_context_release(aio_context);
 }
@@ -579,13 +565,29 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
     BlockDriverState *bs = rs->opaque;
     BDRVReplicationState *s;
     AioContext *aio_context;
+    Error *local_err = NULL;
 
     aio_context = bdrv_get_aio_context(bs);
     aio_context_acquire(aio_context);
     s = bs->opaque;
 
-    if (s->mode == REPLICATION_MODE_SECONDARY) {
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        if (!s->secondary_disk->bs->job) {
+            error_setg(errp, "Backup job was cancelled unexpectedly");
+            break;
+        }
+        backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            break;
+        }
         secondary_do_checkpoint(s, errp);
+        break;
+    default:
+        abort();
     }
     aio_context_release(aio_context);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option
  2016-12-05  8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
                   ` (2 preceding siblings ...)
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
@ 2016-12-05  8:35 ` zhanghailiang
  2016-12-20 12:42   ` Changlong Xie
  2017-01-17 13:15   ` Stefan Hajnoczi
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case zhanghailiang
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang
  5 siblings, 2 replies; 24+ messages in thread
From: zhanghailiang @ 2016-12-05  8:35 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, xiecl.fnst, zhanghailiang

Some code logic only be needed in non-shared disk, here
we adjust these codes to prepare for shared disk scenario.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
 block/replication.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 35e9ab3..6574cc2 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -531,21 +531,28 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
             aio_context_release(aio_context);
             return;
         }
-        bdrv_op_block_all(top_bs, s->blocker);
-        bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
 
-        job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
-                                0, MIRROR_SYNC_MODE_NONE, NULL, false,
+        /*
+         * Only in the case of non-shared disk,
+         * the backup job is in the secondary side
+         */
+        if (!s->is_shared_disk) {
+            bdrv_op_block_all(top_bs, s->blocker);
+            bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+            job = backup_job_create(NULL, s->secondary_disk->bs,
+                                s->hidden_disk->bs, 0,
+                                MIRROR_SYNC_MODE_NONE, NULL, false,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            backup_job_cleanup(bs);
-            aio_context_release(aio_context);
-            return;
+            if (local_err) {
+                error_propagate(errp, local_err);
+                backup_job_cleanup(bs);
+                aio_context_release(aio_context);
+                return;
+            }
+            block_job_start(job);
         }
-        block_job_start(job);
 
         secondary_do_checkpoint(s, errp);
         break;
@@ -575,14 +582,16 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
     case REPLICATION_MODE_PRIMARY:
         break;
     case REPLICATION_MODE_SECONDARY:
-        if (!s->secondary_disk->bs->job) {
-            error_setg(errp, "Backup job was cancelled unexpectedly");
-            break;
-        }
-        backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
-        if (local_err) {
-            error_propagate(errp, local_err);
-            break;
+        if (!s->is_shared_disk) {
+            if (!s->secondary_disk->bs->job) {
+                error_setg(errp, "Backup job was cancelled unexpectedly");
+                break;
+            }
+            backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                break;
+            }
         }
         secondary_do_checkpoint(s, errp);
         break;
@@ -663,7 +672,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
          * before the BDS is closed, because we will access hidden
          * disk, secondary disk in backup_job_completed().
          */
-        if (s->secondary_disk->bs->job) {
+        if (!s->is_shared_disk && s->secondary_disk->bs->job) {
             block_job_cancel_sync(s->secondary_disk->bs->job);
         }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case
  2016-12-05  8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
                   ` (3 preceding siblings ...)
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
@ 2016-12-05  8:35 ` zhanghailiang
  2017-01-17 13:19   ` Stefan Hajnoczi
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang
  5 siblings, 1 reply; 24+ messages in thread
From: zhanghailiang @ 2016-12-05  8:35 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, xiecl.fnst,
	zhanghailiang, Zhang Chen

Just as the scenario of non-shared disk block replication,
we are going to implement block replication from many basic
blocks that are already in QEMU.
The architecture is:

         virtio-blk                     ||                               .----------
             /                          ||                               | Secondary
            /                           ||                               '----------
           /                            ||                                 virtio-blk
          /                             ||                                      |
          |                             ||                               replication(5)
          |                    NBD  -------->   NBD   (2)                       |
          |                  client     ||    server ---> hidden disk <-- active disk(4)
          |                     ^       ||                      |
          |              replication(1) ||                      |
          |                     |       ||                      |
          |   +-----------------'       ||                      |
         (3)  |drive-backup sync=none   ||                      |
--------. |   +-----------------+       ||                      |
Primary | |                     |       ||           backing    |
--------' |                     |       ||                      |
          V                     |                               |
       +-------------------------------------------+            |
       |               shared disk                 | <----------+
       +-------------------------------------------+

    1) Primary writes will read original data and forward it to Secondary
       QEMU.
    2) The hidden-disk is created automatically. It buffers the original content
       that is modified by the primary VM. It should also be an empty disk, and
       the driver supports bdrv_make_empty() and backing file.
    3) Primary write requests will be written to Shared disk.
    4) Secondary write requests will be buffered in the active disk and it
       will overwrite the existing sector content in the buffer.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
---
 block/replication.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/block/replication.c b/block/replication.c
index 6574cc2..f416ca5 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -233,7 +233,7 @@ static coroutine_fn int replication_co_readv(BlockDriverState *bs,
                                              QEMUIOVector *qiov)
 {
     BDRVReplicationState *s = bs->opaque;
-    BdrvChild *child = s->secondary_disk;
+    BdrvChild *child = s->is_shared_disk ? s->primary_disk : s->secondary_disk;
     BlockJob *job = NULL;
     CowRequest req;
     int ret;
@@ -415,7 +415,12 @@ static void backup_job_completed(void *opaque, int ret)
         s->error = -EIO;
     }
 
-    backup_job_cleanup(bs);
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        s->replication_state = BLOCK_REPLICATION_DONE;
+        s->error = 0;
+    } else {
+        backup_job_cleanup(bs);
+    }
 }
 
 static bool check_top_bs(BlockDriverState *top_bs, BlockDriverState *bs)
@@ -467,6 +472,19 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
 
     switch (s->mode) {
     case REPLICATION_MODE_PRIMARY:
+        if (s->is_shared_disk) {
+            job = backup_job_create(NULL, s->primary_disk->bs, bs, 0,
+                MIRROR_SYNC_MODE_NONE, NULL, false, BLOCKDEV_ON_ERROR_REPORT,
+                BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
+                backup_job_completed, bs, NULL, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                backup_job_cleanup(bs);
+                aio_context_release(aio_context);
+                return;
+            }
+            block_job_start(job);
+        }
         break;
     case REPLICATION_MODE_SECONDARY:
         s->active_disk = bs->file;
@@ -485,7 +503,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
         }
 
         s->secondary_disk = s->hidden_disk->bs->backing;
-        if (!s->secondary_disk->bs || !bdrv_has_blk(s->secondary_disk->bs)) {
+        if (!s->secondary_disk->bs ||
+            (!s->is_shared_disk && !bdrv_has_blk(s->secondary_disk->bs))) {
             error_setg(errp, "The secondary disk doesn't have block backend");
             aio_context_release(aio_context);
             return;
@@ -580,11 +599,24 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
 
     switch (s->mode) {
     case REPLICATION_MODE_PRIMARY:
+        if (s->is_shared_disk) {
+            if (!s->primary_disk->bs->job) {
+                error_setg(errp, "Primary backup job was cancelled"
+                           " unexpectedly");
+                break;
+            }
+
+            backup_do_checkpoint(s->primary_disk->bs->job, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+            }
+        }
         break;
     case REPLICATION_MODE_SECONDARY:
         if (!s->is_shared_disk) {
             if (!s->secondary_disk->bs->job) {
-                error_setg(errp, "Backup job was cancelled unexpectedly");
+                error_setg(errp, "Secondary backup job was cancelled"
+                           " unexpectedly");
                 break;
             }
             backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
@@ -663,8 +695,12 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
 
     switch (s->mode) {
     case REPLICATION_MODE_PRIMARY:
-        s->replication_state = BLOCK_REPLICATION_DONE;
-        s->error = 0;
+        if (s->is_shared_disk && s->primary_disk->bs->job) {
+            block_job_cancel(s->primary_disk->bs->job);
+        } else {
+            s->replication_state = BLOCK_REPLICATION_DONE;
+            s->error = 0;
+        }
         break;
     case REPLICATION_MODE_SECONDARY:
         /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH RFC v2 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver
  2016-12-05  8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
                   ` (4 preceding siblings ...)
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case zhanghailiang
@ 2016-12-05  8:35 ` zhanghailiang
  5 siblings, 0 replies; 24+ messages in thread
From: zhanghailiang @ 2016-12-05  8:35 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, xiecl.fnst,
	zhanghailiang, Eric Blake

Without this callback, there will be an error reports in the primary side:
"qemu-system-x86_64: Couldn't determine the cluster size of the target image,
which has no backing file: Operation not supported
Aborting, since this may create an unusable destination image"

For nbd driver, it doesn't have cluster size, so here we return
a fake value for it.

This patch should be dropped if Eric's nbd patch be merged.
https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03567.html

Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 block/nbd.c         | 12 ++++++++++++
 block/replication.c |  6 ++++++
 2 files changed, 18 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 35f24be..b71a13d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -43,6 +43,8 @@
 
 #define EN_OPTSTR ":exportname="
 
+#define NBD_FAKE_CLUSTER_SIZE 512
+
 typedef struct BDRVNBDState {
     NBDClientSession client;
 
@@ -552,6 +554,13 @@ static void nbd_refresh_filename(BlockDriverState *bs, QDict *options)
     bs->full_open_options = opts;
 }
 
+static int nbd_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    bdi->cluster_size  = NBD_FAKE_CLUSTER_SIZE;
+
+    return 0;
+}
+
 static BlockDriver bdrv_nbd = {
     .format_name                = "nbd",
     .protocol_name              = "nbd",
@@ -569,6 +578,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_get_info              = nbd_get_info,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -588,6 +598,7 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_get_info              = nbd_get_info,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -607,6 +618,7 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_get_info              = nbd_get_info,
 };
 
 static void bdrv_nbd_init(void)
diff --git a/block/replication.c b/block/replication.c
index f416ca5..5f14360 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -731,6 +731,11 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
     aio_context_release(aio_context);
 }
 
+static int replication_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    return bdrv_get_info(bs->file->bs, bdi);
+}
+
 BlockDriver bdrv_replication = {
     .format_name                = "replication",
     .protocol_name              = "replication",
@@ -743,6 +748,7 @@ BlockDriver bdrv_replication = {
     .bdrv_co_readv              = replication_co_readv,
     .bdrv_co_writev             = replication_co_writev,
 
+    .bdrv_get_info              = replication_get_info,
     .is_filter                  = true,
     .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
@ 2016-12-05 16:22   ` Eric Blake
  2017-01-18  6:58     ` Hailiang Zhang
  2016-12-20 11:34   ` Changlong Xie
  2017-01-17 11:25   ` Stefan Hajnoczi
  2 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2016-12-05 16:22 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel, qemu-block
  Cc: kwolf, xiecl.fnst, Zhang Chen, wency, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On 12/05/2016 02:35 AM, zhanghailiang wrote:
> We use these two options to identify which disk is
> shared
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
> ---
> v2:
> - add these two options for BlockdevOptionsReplication to support
>   qmp blockdev-add command.
> - fix a memory leak found by Changlong
> ---

> +++ b/qapi/block-core.json
> @@ -2232,12 +2232,19 @@
>  #          node who owns the replication node chain. Must not be given in
>  #          primary mode.
>  #
> +# @shared-disk-id: #optional The id of shared disk while in replication mode.
> +#
> +# @shared-disk: #optional To indicate whether or not a disk is shared by
> +#               primary VM and secondary VM.

Both of these need a '(since 2.9)' designation since they've missed 2.8,
and could probably also use documentation on the default value assumed
when the parameter is omitted.

> +#
>  # Since: 2.8
>  ##
>  { 'struct': 'BlockdevOptionsReplication',
>    'base': 'BlockdevOptionsGenericFormat',
>    'data': { 'mode': 'ReplicationMode',
> -            '*top-id': 'str' } }
> +            '*top-id': 'str',
> +            '*shared-disk-id': 'str',
> +            '*shared-disk': 'bool' } }
>  
>  ##
>  # @NFSTransport
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
  2016-12-05  8:34 ` [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
@ 2016-12-20 11:23   ` Changlong Xie
  2017-01-13 13:41   ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Changlong Xie @ 2016-12-20 11:23 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, Zhang Chen

On 12/05/2016 04:34 PM, zhanghailiang wrote:
> Introuduce the scenario of shared-disk block replication
> and how to use it.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
> ---
> v2:
> - fix some problems found by Changlong
> ---
>   docs/block-replication.txt | 139 +++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 135 insertions(+), 4 deletions(-)
>
> diff --git a/docs/block-replication.txt b/docs/block-replication.txt
> index 6bde673..fbfe005 100644
> --- a/docs/block-replication.txt
> +++ b/docs/block-replication.txt
> @@ -24,7 +24,7 @@ only dropped at next checkpoint time. To reduce the network transportation
>   effort during a vmstate checkpoint, the disk modification operations of
>   the Primary disk are asynchronously forwarded to the Secondary node.
>
> -== Workflow ==
> +== Non-shared disk workflow ==
>   The following is the image of block replication workflow:
>
>           +----------------------+            +------------------------+
> @@ -57,7 +57,7 @@ The following is the image of block replication workflow:
>       4) Secondary write requests will be buffered in the Disk buffer and it
>          will overwrite the existing sector content in the buffer.
>
> -== Architecture ==
> +== Non-shared disk architecture ==
>   We are going to implement block replication from many basic
>   blocks that are already in QEMU.
>
> @@ -106,6 +106,74 @@ any state that would otherwise be lost by the speculative write-through
>   of the NBD server into the secondary disk. So before block replication,
>   the primary disk and secondary disk should contain the same data.
>
> +== Shared Disk Mode Workflow ==
> +The following is the image of block replication workflow:
> +
> +        +----------------------+            +------------------------+
> +        |Primary Write Requests|            |Secondary Write Requests|
> +        +----------------------+            +------------------------+
> +                  |                                       |
> +                  |                                      (4)
> +                  |                                       V
> +                  |                              /-------------\
> +                  | (2)Forward and write through |             |
> +                  | +--------------------------> | Disk Buffer |
> +                  | |                            |             |
> +                  | |                            \-------------/
> +                  | |(1)read                           |
> +                  | |                                  |
> +       (3)write   | |                                  | backing file
> +                  V |                                  |
> +                 +-----------------------------+       |
> +                 | Shared Disk                 | <-----+
> +                 +-----------------------------+
> +
> +    1) Primary writes will read original data and forward it to Secondary
> +       QEMU.
> +    2) Before Primary write requests are written to Shared disk, the
> +       original sector content will be read from Shared disk and
> +       forwarded and buffered in the Disk buffer on the secondary site,
> +       but it will not overwrite the existing sector content (it could be
> +       from either "Secondary Write Requests" or previous COW of "Primary
> +       Write Requests") in the Disk buffer.
> +    3) Primary write requests will be written to Shared disk.
> +    4) Secondary write requests will be buffered in the Disk buffer and it
> +       will overwrite the existing sector content in the buffer.
> +
> +== Shared Disk Mode Architecture ==
> +We are going to implement block replication from many basic
> +blocks that are already in QEMU.
> +         virtio-blk                     ||                               .----------
> +             /                          ||                               | Secondary
> +            /                           ||                               '----------
> +           /                            ||                                 virtio-blk
> +          /                             ||                                      |
> +          |                             ||                               replication(5)
> +          |                    NBD  -------->   NBD   (2)                       |
> +          |                  client     ||    server ---> hidden disk <-- active disk(4)
> +          |                     ^       ||                      |
> +          |              replication(1) ||                      |
> +          |                     |       ||                      |
> +          |   +-----------------'       ||                      |
> +         (3)  |drive-backup sync=none   ||                      |
> +--------. |   +-----------------+       ||                      |
> +Primary | |                     |       ||           backing    |
> +--------' |                     |       ||                      |
> +          V                     |                               |
> +       +-------------------------------------------+            |
> +       |               shared disk                 | <----------+
> +       +-------------------------------------------+
> +
> +
> +    1) Primary writes will read original data and forward it to Secondary
> +       QEMU.
> +    2) The hidden-disk buffers the original content that is modified by the
> +       primary VM. It should also be an empty disk, and the driver supports
> +       bdrv_make_empty() and backing file.
> +    3) Primary write requests will be written to Shared disk.
> +    4) Secondary write requests will be buffered in the active disk and it
> +       will overwrite the existing sector content in the buffer.
> +
>   == Failure Handling ==
>   There are 7 internal errors when block replication is running:
>   1. I/O error on primary disk
> @@ -145,7 +213,7 @@ d. replication_stop_all()
>      things except failover. The caller must hold the I/O mutex lock if it is
>      in migration/checkpoint thread.
>
> -== Usage ==
> +== Non-shared disk usage ==
>   Primary:
>     -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
>            children.0.file.filename=1.raw,\
> @@ -234,6 +302,69 @@ Secondary:
>     The primary host is down, so we should do the following thing:
>     { 'execute': 'nbd-server-stop' }
>
> +== Shared disk usage ==
> +Primary:
> + -drive if=virtio,id=primary_disk0,file.filename=1.raw,driver=raw
> +
> +Issue qmp command:
> +  { 'execute': 'blockdev-add',
> +    'arguments': {
> +        'driver': 'replication',
> +        'node-name': 'rep',
> +        'mode': 'primary',
> +        'shared-disk-id': 'primary_disk0',
> +        'shared-disk': true,
> +        'file': {
> +            'driver': 'nbd',
> +            'export': 'hidden_disk0',
> +            'server': {
> +                'type': 'inet',
> +                'data': {
> +                    'host': 'xxx.xxx.xxx.xxx',
> +                    'port': 'yyy'
> +                }
> +            }
> +        }
> +     }
> +  }
> +
> +Secondary:
> + -drive if=none,driver=qcow2,file.filename=/mnt/ramfs/hidden_disk.img,id=hidden_disk0,\
> +        backing.driver=raw,backing.file.filename=1.raw \
> + -drive if=virtio,id=active-disk0,driver=replication,mode=secondary,\
> +        file.driver=qcow2,top-id=active-disk0,\
> +        file.file.filename=/mnt/ramfs/active_disk.img,\
> +        file.backing=hidden_disk0,shared-disk=on
> +
> +Issue qmp command:
> +1. { 'execute': 'nbd-server-start',
> +     'arguments': {
> +        'addr': {
> +            'type': 'inet',
> +            'data': {
> +                'host': '0',
> +                'port': 'yyy'
> +            }
> +        }
> +     }
> +   }
> +2. { 'execute': 'nbd-server-add',
> +     'arguments': {
> +        'device': 'hidden_disk0',
> +        'writable': true
> +    }
> +  }
> +
> +After Failover:
> +Primary:
> +  { 'execute': 'x-blockdev-del',
> +    'arguments': {
> +        'node-name': 'rep'
> +    }
> +  }
> +
> +Secondary:
> +  {'execute': 'nbd-server-stop' }
> +
>   TODO:
>   1. Continuous block replication
> -2. Shared disk
>

Looks good to me

Reviewed-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
  2016-12-05 16:22   ` Eric Blake
@ 2016-12-20 11:34   ` Changlong Xie
  2017-01-17 11:25   ` Stefan Hajnoczi
  2 siblings, 0 replies; 24+ messages in thread
From: Changlong Xie @ 2016-12-20 11:34 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency, Zhang Chen

On 12/05/2016 04:35 PM, zhanghailiang wrote:
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index c29bef7..52d7e0d 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2232,12 +2232,19 @@
>   #          node who owns the replication node chain. Must not be given in
>   #          primary mode.
>   #
> +# @shared-disk-id: #optional The id of shared disk while in replication mode.
> +#
> +# @shared-disk: #optional To indicate whether or not a disk is shared by
> +#               primary VM and secondary VM.
> +#

I think we need more detailed description here.

For @shared-disk, we can only both enable or disable it on both side.
For @shared-disk-id, it must/only be given when @shared-disk enable on 
Primary side.

More, you also need to perfect the replication_open() logic.

>   # Since: 2.8
>   ##
>   { 'struct': 'BlockdevOptionsReplication',
>     'base': 'BlockdevOptionsGenericFormat',
>     'data': { 'mode': 'ReplicationMode',
> -            '*top-id': 'str' } }
> +            '*top-id': 'str',
> +            '*shared-disk-id': 'str',
> +            '*shared-disk': 'bool' } }
>
>   ##
>   # @NFSTransport
> --

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint()
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
@ 2016-12-20 12:41   ` Changlong Xie
  2017-01-17 13:10   ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Changlong Xie @ 2016-12-20 12:41 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency

On 12/05/2016 04:35 PM, zhanghailiang wrote:
> The helper backup_do_checkpoint() will be used for primary related
> codes. Here we split it out from secondary_do_checkpoint().
>
> Besides, it is unnecessary to call backup_do_checkpoint() in
> replication starting and normally stop replication path.
> We only need call it while do real checkpointing.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>   block/replication.c | 36 +++++++++++++++++++-----------------
>   1 file changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/block/replication.c b/block/replication.c
> index e87ae87..35e9ab3 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -332,20 +332,8 @@ static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
>
>   static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
>   {
> -    Error *local_err = NULL;
>       int ret;
>
> -    if (!s->secondary_disk->bs->job) {
> -        error_setg(errp, "Backup job was cancelled unexpectedly");
> -        return;
> -    }
> -
> -    backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        return;
> -    }
> -
>       ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
>       if (ret < 0) {
>           error_setg(errp, "Cannot make active disk empty");
> @@ -558,6 +546,8 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>               return;
>           }
>           block_job_start(job);
> +
> +        secondary_do_checkpoint(s, errp);
>           break;
>       default:
>           aio_context_release(aio_context);
> @@ -566,10 +556,6 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>
>       s->replication_state = BLOCK_REPLICATION_RUNNING;
>
> -    if (s->mode == REPLICATION_MODE_SECONDARY) {
> -        secondary_do_checkpoint(s, errp);
> -    }
> -
>       s->error = 0;
>       aio_context_release(aio_context);
>   }
> @@ -579,13 +565,29 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
>       BlockDriverState *bs = rs->opaque;
>       BDRVReplicationState *s;
>       AioContext *aio_context;
> +    Error *local_err = NULL;
>
>       aio_context = bdrv_get_aio_context(bs);
>       aio_context_acquire(aio_context);
>       s = bs->opaque;
>
> -    if (s->mode == REPLICATION_MODE_SECONDARY) {
> +    switch (s->mode) {
> +    case REPLICATION_MODE_PRIMARY:
> +        break;
> +    case REPLICATION_MODE_SECONDARY:
> +        if (!s->secondary_disk->bs->job) {
> +            error_setg(errp, "Backup job was cancelled unexpectedly");
> +            break;
> +        }
> +        backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            break;
> +        }
>           secondary_do_checkpoint(s, errp);
> +        break;
> +    default:
> +        abort();
>       }
>       aio_context_release(aio_context);
>   }
>

Looks good to me

Reviewed-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
@ 2016-12-20 12:42   ` Changlong Xie
  2017-01-18  6:53     ` Hailiang Zhang
  2017-01-17 13:15   ` Stefan Hajnoczi
  1 sibling, 1 reply; 24+ messages in thread
From: Changlong Xie @ 2016-12-20 12:42 UTC (permalink / raw)
  To: zhanghailiang, qemu-devel, qemu-block
  Cc: stefanha, kwolf, mreitz, pbonzini, wency

On 12/05/2016 04:35 PM, zhanghailiang wrote:
> Some code logic only be needed in non-shared disk, here
> we adjust these codes to prepare for shared disk scenario.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>   block/replication.c | 47 ++++++++++++++++++++++++++++-------------------
>   1 file changed, 28 insertions(+), 19 deletions(-)
>
> diff --git a/block/replication.c b/block/replication.c
> index 35e9ab3..6574cc2 100644
> --- a/block/replication.c
> +++ b/block/replication.c
> @@ -531,21 +531,28 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>               aio_context_release(aio_context);
>               return;
>           }
> -        bdrv_op_block_all(top_bs, s->blocker);
> -        bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
>
> -        job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
> -                                0, MIRROR_SYNC_MODE_NONE, NULL, false,
> +        /*
> +         * Only in the case of non-shared disk,
> +         * the backup job is in the secondary side
> +         */
> +        if (!s->is_shared_disk) {
> +            bdrv_op_block_all(top_bs, s->blocker);
> +            bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
> +            job = backup_job_create(NULL, s->secondary_disk->bs,
> +                                s->hidden_disk->bs, 0,
> +                                MIRROR_SYNC_MODE_NONE, NULL, false,
>                                   BLOCKDEV_ON_ERROR_REPORT,
>                                   BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
>                                   backup_job_completed, bs, NULL, &local_err);

Coding style here.

> -        if (local_err) {
> -            error_propagate(errp, local_err);
> -            backup_job_cleanup(bs);
> -            aio_context_release(aio_context);
> -            return;
> +            if (local_err) {
> +                error_propagate(errp, local_err);
> +                backup_job_cleanup(bs);
> +                aio_context_release(aio_context);
> +                return;
> +            }
> +            block_job_start(job);
>           }
> -        block_job_start(job);
>
>           secondary_do_checkpoint(s, errp);
>           break;
> @@ -575,14 +582,16 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
>       case REPLICATION_MODE_PRIMARY:
>           break;
>       case REPLICATION_MODE_SECONDARY:
> -        if (!s->secondary_disk->bs->job) {
> -            error_setg(errp, "Backup job was cancelled unexpectedly");
> -            break;
> -        }
> -        backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
> -        if (local_err) {
> -            error_propagate(errp, local_err);
> -            break;
> +        if (!s->is_shared_disk) {
> +            if (!s->secondary_disk->bs->job) {
> +                error_setg(errp, "Backup job was cancelled unexpectedly");
> +                break;
> +            }
> +            backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
> +            if (local_err) {
> +                error_propagate(errp, local_err);
> +                break;
> +            }
>           }
>           secondary_do_checkpoint(s, errp);
>           break;
> @@ -663,7 +672,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
>            * before the BDS is closed, because we will access hidden
>            * disk, secondary disk in backup_job_completed().
>            */
> -        if (s->secondary_disk->bs->job) {
> +        if (!s->is_shared_disk && s->secondary_disk->bs->job) {
>               block_job_cancel_sync(s->secondary_disk->bs->job);
>           }
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
  2016-12-05  8:34 ` [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
  2016-12-20 11:23   ` Changlong Xie
@ 2017-01-13 13:41   ` Stefan Hajnoczi
  2017-01-19  2:50     ` Hailiang Zhang
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2017-01-13 13:41 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:
> +Issue qmp command:
> +  { 'execute': 'blockdev-add',
> +    'arguments': {
> +        'driver': 'replication',
> +        'node-name': 'rep',
> +        'mode': 'primary',
> +        'shared-disk-id': 'primary_disk0',
> +        'shared-disk': true,
> +        'file': {
> +            'driver': 'nbd',
> +            'export': 'hidden_disk0',
> +            'server': {
> +                'type': 'inet',
> +                'data': {
> +                    'host': 'xxx.xxx.xxx.xxx',
> +                    'port': 'yyy'
> +                }
> +            }

block/nbd.c does have good error handling and recovery in case there is
a network issue.  There are no reconnection attempts or timeouts that
deal with a temporary loss of network connectivity.

This is a general problem with block/nbd.c and not something to solve in
this patch series.  I'm just mentioning it because it may affect COLO
replication.

I'm sure these limitations in block/nbd.c can be fixed but it will take
some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
code could also benefit from generic network connection recovery.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
  2016-12-05 16:22   ` Eric Blake
  2016-12-20 11:34   ` Changlong Xie
@ 2017-01-17 11:25   ` Stefan Hajnoczi
  2017-01-18  6:54     ` Hailiang Zhang
  2 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2017-01-17 11:25 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

[-- Attachment #1: Type: text/plain, Size: 1469 bytes --]

On Mon, Dec 05, 2016 at 04:35:00PM +0800, zhanghailiang wrote:
> @@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict *options,
>      QemuOpts *opts = NULL;
>      const char *mode;
>      const char *top_id;
> +    const char *shared_disk_id;
> +    BlockBackend *blk;
> +    BlockDriverState *tmp_bs;
>  
>      ret = -EINVAL;
>      opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
> @@ -119,6 +136,25 @@ static int replication_open(BlockDriverState *bs, QDict *options,
>                     "The option mode's value should be primary or secondary");
>          goto fail;
>      }
> +    s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
> +                                          false);
> +    if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
> +        shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
> +        if (!shared_disk_id) {
> +            error_setg(&local_err, "Missing shared disk blk");
> +            goto fail;
> +        }
> +        s->shared_disk_id = g_strdup(shared_disk_id);
> +        blk = blk_by_name(s->shared_disk_id);
> +        if (!blk) {
> +            g_free(s->shared_disk_id);
> +            error_setg(&local_err, "There is no %s block", s->shared_disk_id);
> +            goto fail;

Please move the g_free() to the fail label to prevent future code
changes from introducing a memory leak.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint()
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
  2016-12-20 12:41   ` Changlong Xie
@ 2017-01-17 13:10   ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2017-01-17 13:10 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency, xiecl.fnst

[-- Attachment #1: Type: text/plain, Size: 628 bytes --]

On Mon, Dec 05, 2016 at 04:35:01PM +0800, zhanghailiang wrote:
> The helper backup_do_checkpoint() will be used for primary related
> codes. Here we split it out from secondary_do_checkpoint().
> 
> Besides, it is unnecessary to call backup_do_checkpoint() in
> replication starting and normally stop replication path.
> We only need call it while do real checkpointing.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  block/replication.c | 36 +++++++++++++++++++-----------------
>  1 file changed, 19 insertions(+), 17 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
  2016-12-20 12:42   ` Changlong Xie
@ 2017-01-17 13:15   ` Stefan Hajnoczi
  1 sibling, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2017-01-17 13:15 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency, xiecl.fnst

[-- Attachment #1: Type: text/plain, Size: 446 bytes --]

On Mon, Dec 05, 2016 at 04:35:02PM +0800, zhanghailiang wrote:
> Some code logic only be needed in non-shared disk, here
> we adjust these codes to prepare for shared disk scenario.
> 
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> ---
>  block/replication.c | 47 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case
  2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case zhanghailiang
@ 2017-01-17 13:19   ` Stefan Hajnoczi
  2017-01-18  6:53     ` Hailiang Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2017-01-17 13:19 UTC (permalink / raw)
  To: zhanghailiang
  Cc: qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

On Mon, Dec 05, 2016 at 04:35:03PM +0800, zhanghailiang wrote:
> @@ -663,8 +695,12 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
>  
>      switch (s->mode) {
>      case REPLICATION_MODE_PRIMARY:
> -        s->replication_state = BLOCK_REPLICATION_DONE;
> -        s->error = 0;
> +        if (s->is_shared_disk && s->primary_disk->bs->job) {
> +            block_job_cancel(s->primary_disk->bs->job);

Should this be block_job_cancel_sync()?

> +        } else {
> +            s->replication_state = BLOCK_REPLICATION_DONE;
> +            s->error = 0;
> +        }
>          break;
>      case REPLICATION_MODE_SECONDARY:
>          /*
> -- 
> 1.8.3.1
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case
  2017-01-17 13:19   ` Stefan Hajnoczi
@ 2017-01-18  6:53     ` Hailiang Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Hailiang Zhang @ 2017-01-18  6:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

Hi Stefan,

On 2017/1/17 21:19, Stefan Hajnoczi wrote:
> On Mon, Dec 05, 2016 at 04:35:03PM +0800, zhanghailiang wrote:
>> @@ -663,8 +695,12 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
>>
>>       switch (s->mode) {
>>       case REPLICATION_MODE_PRIMARY:
>> -        s->replication_state = BLOCK_REPLICATION_DONE;
>> -        s->error = 0;
>> +        if (s->is_shared_disk && s->primary_disk->bs->job) {
>> +            block_job_cancel(s->primary_disk->bs->job);
>
> Should this be block_job_cancel_sync()?
>

No, here it is different from the secondary side which needs to wait
until backup job been canceled before resumes to run (Or there will be
an error, https://patchwork.kernel.org/patch/9128841/).

For primary VM, Just as you can see the design scenario in patch 1,
It accesses the shared disk directly, the backup job whose source side
is just the shared disk does not influence primary VM's running,
So IMHO, it is safe to call block_job_cancel here.

Thanks,
Hailiang


>> +        } else {
>> +            s->replication_state = BLOCK_REPLICATION_DONE;
>> +            s->error = 0;
>> +        }
>>           break;
>>       case REPLICATION_MODE_SECONDARY:
>>           /*
>> --
>> 1.8.3.1
>>
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option
  2016-12-20 12:42   ` Changlong Xie
@ 2017-01-18  6:53     ` Hailiang Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Hailiang Zhang @ 2017-01-18  6:53 UTC (permalink / raw)
  To: Changlong Xie, qemu-devel, qemu-block
  Cc: xuquan8, stefanha, kwolf, mreitz, pbonzini, wency

On 2016/12/20 20:42, Changlong Xie wrote:
> On 12/05/2016 04:35 PM, zhanghailiang wrote:
>> Some code logic only be needed in non-shared disk, here
>> we adjust these codes to prepare for shared disk scenario.
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> ---
>>    block/replication.c | 47 ++++++++++++++++++++++++++++-------------------
>>    1 file changed, 28 insertions(+), 19 deletions(-)
>>
>> diff --git a/block/replication.c b/block/replication.c
>> index 35e9ab3..6574cc2 100644
>> --- a/block/replication.c
>> +++ b/block/replication.c
>> @@ -531,21 +531,28 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
>>                aio_context_release(aio_context);
>>                return;
>>            }
>> -        bdrv_op_block_all(top_bs, s->blocker);
>> -        bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
>>
>> -        job = backup_job_create(NULL, s->secondary_disk->bs, s->hidden_disk->bs,
>> -                                0, MIRROR_SYNC_MODE_NONE, NULL, false,
>> +        /*
>> +         * Only in the case of non-shared disk,
>> +         * the backup job is in the secondary side
>> +         */
>> +        if (!s->is_shared_disk) {
>> +            bdrv_op_block_all(top_bs, s->blocker);
>> +            bdrv_op_unblock(top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
>> +            job = backup_job_create(NULL, s->secondary_disk->bs,
>> +                                s->hidden_disk->bs, 0,
>> +                                MIRROR_SYNC_MODE_NONE, NULL, false,
>>                                    BLOCKDEV_ON_ERROR_REPORT,
>>                                    BLOCKDEV_ON_ERROR_REPORT, BLOCK_JOB_INTERNAL,
>>                                    backup_job_completed, bs, NULL, &local_err);
>
> Coding style here.
>

Will fix it in next version, thanks.

>> -        if (local_err) {
>> -            error_propagate(errp, local_err);
>> -            backup_job_cleanup(bs);
>> -            aio_context_release(aio_context);
>> -            return;
>> +            if (local_err) {
>> +                error_propagate(errp, local_err);
>> +                backup_job_cleanup(bs);
>> +                aio_context_release(aio_context);
>> +                return;
>> +            }
>> +            block_job_start(job);
>>            }
>> -        block_job_start(job);
>>
>>            secondary_do_checkpoint(s, errp);
>>            break;
>> @@ -575,14 +582,16 @@ static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
>>        case REPLICATION_MODE_PRIMARY:
>>            break;
>>        case REPLICATION_MODE_SECONDARY:
>> -        if (!s->secondary_disk->bs->job) {
>> -            error_setg(errp, "Backup job was cancelled unexpectedly");
>> -            break;
>> -        }
>> -        backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
>> -        if (local_err) {
>> -            error_propagate(errp, local_err);
>> -            break;
>> +        if (!s->is_shared_disk) {
>> +            if (!s->secondary_disk->bs->job) {
>> +                error_setg(errp, "Backup job was cancelled unexpectedly");
>> +                break;
>> +            }
>> +            backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
>> +            if (local_err) {
>> +                error_propagate(errp, local_err);
>> +                break;
>> +            }
>>            }
>>            secondary_do_checkpoint(s, errp);
>>            break;
>> @@ -663,7 +672,7 @@ static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
>>             * before the BDS is closed, because we will access hidden
>>             * disk, secondary disk in backup_job_completed().
>>             */
>> -        if (s->secondary_disk->bs->job) {
>> +        if (!s->is_shared_disk && s->secondary_disk->bs->job) {
>>                block_job_cancel_sync(s->secondary_disk->bs->job);
>>            }
>>
>>
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options
  2017-01-17 11:25   ` Stefan Hajnoczi
@ 2017-01-18  6:54     ` Hailiang Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Hailiang Zhang @ 2017-01-18  6:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

On 2017/1/17 19:25, Stefan Hajnoczi wrote:
> On Mon, Dec 05, 2016 at 04:35:00PM +0800, zhanghailiang wrote:
>> @@ -85,6 +99,9 @@ static int replication_open(BlockDriverState *bs, QDict *options,
>>       QemuOpts *opts = NULL;
>>       const char *mode;
>>       const char *top_id;
>> +    const char *shared_disk_id;
>> +    BlockBackend *blk;
>> +    BlockDriverState *tmp_bs;
>>
>>       ret = -EINVAL;
>>       opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
>> @@ -119,6 +136,25 @@ static int replication_open(BlockDriverState *bs, QDict *options,
>>                      "The option mode's value should be primary or secondary");
>>           goto fail;
>>       }
>> +    s->is_shared_disk = qemu_opt_get_bool(opts, REPLICATION_SHARED_DISK,
>> +                                          false);
>> +    if (s->is_shared_disk && (s->mode == REPLICATION_MODE_PRIMARY)) {
>> +        shared_disk_id = qemu_opt_get(opts, REPLICATION_SHARED_DISK_ID);
>> +        if (!shared_disk_id) {
>> +            error_setg(&local_err, "Missing shared disk blk");
>> +            goto fail;
>> +        }
>> +        s->shared_disk_id = g_strdup(shared_disk_id);
>> +        blk = blk_by_name(s->shared_disk_id);
>> +        if (!blk) {
>> +            g_free(s->shared_disk_id);
>> +            error_setg(&local_err, "There is no %s block", s->shared_disk_id);
>> +            goto fail;
>
> Please move the g_free() to the fail label to prevent future code
> changes from introducing a memory leak.
>

OK, I will fix it in next version, thanks very much.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options
  2016-12-05 16:22   ` Eric Blake
@ 2017-01-18  6:58     ` Hailiang Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Hailiang Zhang @ 2017-01-18  6:58 UTC (permalink / raw)
  To: Eric Blake, qemu-devel, qemu-block
  Cc: xuquan8, kwolf, xiecl.fnst, Zhang Chen, wency, mreitz, stefanha,
	pbonzini

On 2016/12/6 0:22, Eric Blake wrote:
> On 12/05/2016 02:35 AM, zhanghailiang wrote:
>> We use these two options to identify which disk is
>> shared
>>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
>> ---
>> v2:
>> - add these two options for BlockdevOptionsReplication to support
>>    qmp blockdev-add command.
>> - fix a memory leak found by Changlong
>> ---
>
>> +++ b/qapi/block-core.json
>> @@ -2232,12 +2232,19 @@
>>   #          node who owns the replication node chain. Must not be given in
>>   #          primary mode.
>>   #
>> +# @shared-disk-id: #optional The id of shared disk while in replication mode.
>> +#
>> +# @shared-disk: #optional To indicate whether or not a disk is shared by
>> +#               primary VM and secondary VM.
>
> Both of these need a '(since 2.9)' designation since they've missed 2.8,
> and could probably also use documentation on the default value assumed
> when the parameter is omitted.
>

OK, i will add it in next version, thanks.

>> +#
>>   # Since: 2.8
>>   ##
>>   { 'struct': 'BlockdevOptionsReplication',
>>     'base': 'BlockdevOptionsGenericFormat',
>>     'data': { 'mode': 'ReplicationMode',
>> -            '*top-id': 'str' } }
>> +            '*top-id': 'str',
>> +            '*shared-disk-id': 'str',
>> +            '*shared-disk': 'bool' } }
>>
>>   ##
>>   # @NFSTransport
>>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
  2017-01-13 13:41   ` Stefan Hajnoczi
@ 2017-01-19  2:50     ` Hailiang Zhang
  2017-01-19 16:41       ` Stefan Hajnoczi
  0 siblings, 1 reply; 24+ messages in thread
From: Hailiang Zhang @ 2017-01-19  2:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

On 2017/1/13 21:41, Stefan Hajnoczi wrote:
> On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:
>> +Issue qmp command:
>> +  { 'execute': 'blockdev-add',
>> +    'arguments': {
>> +        'driver': 'replication',
>> +        'node-name': 'rep',
>> +        'mode': 'primary',
>> +        'shared-disk-id': 'primary_disk0',
>> +        'shared-disk': true,
>> +        'file': {
>> +            'driver': 'nbd',
>> +            'export': 'hidden_disk0',
>> +            'server': {
>> +                'type': 'inet',
>> +                'data': {
>> +                    'host': 'xxx.xxx.xxx.xxx',
>> +                    'port': 'yyy'
>> +                }
>> +            }
>
> block/nbd.c does have good error handling and recovery in case there is
> a network issue.  There are no reconnection attempts or timeouts that
> deal with a temporary loss of network connectivity.
>
> This is a general problem with block/nbd.c and not something to solve in
> this patch series.  I'm just mentioning it because it may affect COLO
> replication.
>
> I'm sure these limitations in block/nbd.c can be fixed but it will take
> some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
> code could also benefit from generic network connection recovery.
>

Hmm, good suggestion, but IMHO, here, COLO is a little different from
other scenes, if the reconnection method has been implemented,
it still needs a mechanism to identify the temporary loss of network
connection or real broken in network connection.

I did a simple test, just ifconfig down the network card that be used
by block replication, It seems that NBD in qemu doesn't has a ability to
find the connection has been broken, there was no error reports
and COLO just got stuck in vm_stop() where it called aio_poll().

Thanks,
Hailiang



> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
  2017-01-19  2:50     ` Hailiang Zhang
@ 2017-01-19 16:41       ` Stefan Hajnoczi
  2017-01-20  2:35         ` Hailiang Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2017-01-19 16:41 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

[-- Attachment #1: Type: text/plain, Size: 2539 bytes --]

On Thu, Jan 19, 2017 at 10:50:19AM +0800, Hailiang Zhang wrote:
> On 2017/1/13 21:41, Stefan Hajnoczi wrote:
> > On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:
> > > +Issue qmp command:
> > > +  { 'execute': 'blockdev-add',
> > > +    'arguments': {
> > > +        'driver': 'replication',
> > > +        'node-name': 'rep',
> > > +        'mode': 'primary',
> > > +        'shared-disk-id': 'primary_disk0',
> > > +        'shared-disk': true,
> > > +        'file': {
> > > +            'driver': 'nbd',
> > > +            'export': 'hidden_disk0',
> > > +            'server': {
> > > +                'type': 'inet',
> > > +                'data': {
> > > +                    'host': 'xxx.xxx.xxx.xxx',
> > > +                    'port': 'yyy'
> > > +                }
> > > +            }
> > 
> > block/nbd.c does have good error handling and recovery in case there is
> > a network issue.  There are no reconnection attempts or timeouts that
> > deal with a temporary loss of network connectivity.
> > 
> > This is a general problem with block/nbd.c and not something to solve in
> > this patch series.  I'm just mentioning it because it may affect COLO
> > replication.
> > 
> > I'm sure these limitations in block/nbd.c can be fixed but it will take
> > some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
> > code could also benefit from generic network connection recovery.
> > 
> 
> Hmm, good suggestion, but IMHO, here, COLO is a little different from
> other scenes, if the reconnection method has been implemented,
> it still needs a mechanism to identify the temporary loss of network
> connection or real broken in network connection.
> 
> I did a simple test, just ifconfig down the network card that be used
> by block replication, It seems that NBD in qemu doesn't has a ability to
> find the connection has been broken, there was no error reports
> and COLO just got stuck in vm_stop() where it called aio_poll().

Yes, this is the vm_stop() problem again.  There is no reliable way to
cancel I/O requests so instead QEMU waits...forever.  A solution is
needed so COLO doesn't hang on network failure.

I'm not sure how to solve the problem.  The secondary still has the last
successful checkpoint so it could resume instead of waiting for the
current checkpoint to commit.

There may still be NBD I/O in flight, so the would need to drain it or
fence storage to prevent interference once the secondary VM is running.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case
  2017-01-19 16:41       ` Stefan Hajnoczi
@ 2017-01-20  2:35         ` Hailiang Zhang
  0 siblings, 0 replies; 24+ messages in thread
From: Hailiang Zhang @ 2017-01-20  2:35 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: xuquan8, qemu-devel, qemu-block, kwolf, mreitz, pbonzini, wency,
	xiecl.fnst, Zhang Chen

On 2017/1/20 0:41, Stefan Hajnoczi wrote:
> On Thu, Jan 19, 2017 at 10:50:19AM +0800, Hailiang Zhang wrote:
>> On 2017/1/13 21:41, Stefan Hajnoczi wrote:
>>> On Mon, Dec 05, 2016 at 04:34:59PM +0800, zhanghailiang wrote:
>>>> +Issue qmp command:
>>>> +  { 'execute': 'blockdev-add',
>>>> +    'arguments': {
>>>> +        'driver': 'replication',
>>>> +        'node-name': 'rep',
>>>> +        'mode': 'primary',
>>>> +        'shared-disk-id': 'primary_disk0',
>>>> +        'shared-disk': true,
>>>> +        'file': {
>>>> +            'driver': 'nbd',
>>>> +            'export': 'hidden_disk0',
>>>> +            'server': {
>>>> +                'type': 'inet',
>>>> +                'data': {
>>>> +                    'host': 'xxx.xxx.xxx.xxx',
>>>> +                    'port': 'yyy'
>>>> +                }
>>>> +            }
>>>
>>> block/nbd.c does have good error handling and recovery in case there is
>>> a network issue.  There are no reconnection attempts or timeouts that
>>> deal with a temporary loss of network connectivity.
>>>
>>> This is a general problem with block/nbd.c and not something to solve in
>>> this patch series.  I'm just mentioning it because it may affect COLO
>>> replication.
>>>
>>> I'm sure these limitations in block/nbd.c can be fixed but it will take
>>> some effort.  Maybe block/sheepdog.c, net/socket.c, and other network
>>> code could also benefit from generic network connection recovery.
>>>
>>
>> Hmm, good suggestion, but IMHO, here, COLO is a little different from
>> other scenes, if the reconnection method has been implemented,
>> it still needs a mechanism to identify the temporary loss of network
>> connection or real broken in network connection.
>>
>> I did a simple test, just ifconfig down the network card that be used
>> by block replication, It seems that NBD in qemu doesn't has a ability to
>> find the connection has been broken, there was no error reports
>> and COLO just got stuck in vm_stop() where it called aio_poll().
>
> Yes, this is the vm_stop() problem again.  There is no reliable way to
> cancel I/O requests so instead QEMU waits...forever.  A solution is
> needed so COLO doesn't hang on network failure.
>

Yes, COLO needs to detect this situation and cancel the requests in a proper
way.

> I'm not sure how to solve the problem.  The secondary still has the last
> successful checkpoint so it could resume instead of waiting for the
> current checkpoint to commit.
>
> There may still be NBD I/O in flight, so the would need to drain it or
> fence storage to prevent interference once the secondary VM is running.
>

Agreed, we need to think this carefully. We'll put these reliabilities
developing in future after COLO's basic function completed.

Thanks,
Hailiang

> Stefan
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-01-20  2:36 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-05  8:34 [Qemu-devel] [PATCH RFC v2 0/6] COLO block replication supports shared disk case zhanghailiang
2016-12-05  8:34 ` [Qemu-devel] [PATCH RFC v2 1/6] docs/block-replication: Add description for shared-disk case zhanghailiang
2016-12-20 11:23   ` Changlong Xie
2017-01-13 13:41   ` Stefan Hajnoczi
2017-01-19  2:50     ` Hailiang Zhang
2017-01-19 16:41       ` Stefan Hajnoczi
2017-01-20  2:35         ` Hailiang Zhang
2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 2/6] replication: add shared-disk and shared-disk-id options zhanghailiang
2016-12-05 16:22   ` Eric Blake
2017-01-18  6:58     ` Hailiang Zhang
2016-12-20 11:34   ` Changlong Xie
2017-01-17 11:25   ` Stefan Hajnoczi
2017-01-18  6:54     ` Hailiang Zhang
2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 3/6] replication: Split out backup_do_checkpoint() from secondary_do_checkpoint() zhanghailiang
2016-12-20 12:41   ` Changlong Xie
2017-01-17 13:10   ` Stefan Hajnoczi
2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 4/6] replication: fix code logic with the new shared_disk option zhanghailiang
2016-12-20 12:42   ` Changlong Xie
2017-01-18  6:53     ` Hailiang Zhang
2017-01-17 13:15   ` Stefan Hajnoczi
2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 5/6] replication: Implement block replication for shared disk case zhanghailiang
2017-01-17 13:19   ` Stefan Hajnoczi
2017-01-18  6:53     ` Hailiang Zhang
2016-12-05  8:35 ` [Qemu-devel] [PATCH RFC v2 6/6] nbd/replication: implement .bdrv_get_info() for nbd and replication driver zhanghailiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.