qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints
@ 2016-02-05  4:17 Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 1/9] unblock backup operations in backing file Changlong Xie
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:17 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html

Patch status:
1. Acked patches: none 
2. Reviewed patches: patch 4
3. Updated patches: patch 3, 4, 5, 7, 8
Note: patch 7 is a new patch that introduces genernal start/stop/checkpoint/
get_error
APIs.

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v15

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v14

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V15:
1. Rebase to the newest codes
2. Fix typos and coding style addresed Eric's comments
3. Address Stefan's comments
   1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
   2) Update the message and description for [PATCH 4/9]
   3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
   4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
   5) Use BdrvChild instead of holding on to BlockDriverState * pointers
4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771  
5. Introduce replication_get_error_all to check replication status
6. Remove useless discard interface
V14:
1. Implement auto complete active commit
2. Implement active commit block job for replication.c
3. Address the comments from Stefan, add replication-specific API and data
   structure, also remove old block layer APIs
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c 
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
   opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
   when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
   its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Changlong Xie (1):
  Introduce new APIs to do replication operation

Wen Congyang (8):
  unblock backup operations in backing file
  Store parent BDS in BdrvChild
  Backup: clear all bitmap when doing block checkpoint
  Link backup into block core
  docs: block replication's description
  auto complete active commit
  Implement new driver for block replication
  support replication driver in blockdev-add

 Makefile.objs              |   1 +
 block.c                    |  19 ++
 block/Makefile.objs        |   3 +-
 block/backup.c             |  15 ++
 block/mirror.c             |  13 +-
 block/replication.c        | 594 +++++++++++++++++++++++++++++++++++++++++++++
 blockdev.c                 |   2 +-
 docs/block-replication.txt | 238 ++++++++++++++++++
 include/block/block_int.h  |   6 +-
 qapi/block-core.json       |  33 ++-
 qemu-img.c                 |   2 +-
 replication.c              |  94 +++++++
 replication.h              |  53 ++++
 13 files changed, 1063 insertions(+), 10 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt
 create mode 100644 replication.c
 create mode 100644 replication.h

-- 
1.9.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 1/9] unblock backup operations in backing file
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 2/9] Store parent BDS in BdrvChild Changlong Xie
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index a285de5..70ab625 100644
--- a/block.c
+++ b/block.c
@@ -1279,6 +1279,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
     /* Otherwise we won't be able to commit due to check in bdrv_commit */
     bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
                     bs->backing_blocker);
+    /*
+     * We do backup in 3 ways:
+     * 1. drive backup
+     *    The target bs is new opened, and the source is top BDS
+     * 2. blockdev backup
+     *    Both the source and the target are top BDSes.
+     * 3. internal backup(used for block replication)
+     *    Both the source and the target are backing file
+     *
+     * In case 1, and 2, the backing file is neither the source nor
+     * the target.
+     * In case 3, we will block the top BDS, so there is only one block
+     * job for the top BDS and its backing chain.
+     */
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                    bs->backing_blocker);
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+                    bs->backing_blocker);
 out:
     bdrv_refresh_limits(bs, NULL);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 2/9] Store parent BDS in BdrvChild
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 1/9] unblock backup operations in backing file Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 3/9] Backup: clear all bitmap when doing block checkpoint Changlong Xie
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

We need to access the parent BDS to get the root BDS.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block.c                   | 1 +
 include/block/block_int.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 70ab625..c18b462 100644
--- a/block.c
+++ b/block.c
@@ -1208,6 +1208,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
     BdrvChild *child = g_new(BdrvChild, 1);
     *child = (BdrvChild) {
         .bs     = child_bs,
+        .parent = parent_bs,
         .name   = g_strdup(child_name),
         .role   = child_role,
     };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 89ec138..9204d1e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -363,6 +363,7 @@ extern const BdrvChildRole child_format;
 
 struct BdrvChild {
     BlockDriverState *bs;
+    BlockDriverState *parent;
     char *name;
     const BdrvChildRole *role;
     QLIST_ENTRY(BdrvChild) next;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 3/9] Backup: clear all bitmap when doing block checkpoint
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 1/9] unblock backup operations in backing file Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 2/9] Store parent BDS in BdrvChild Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 4/9] Link backup into block core Changlong Xie
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block/backup.c            | 15 +++++++++++++++
 include/block/block_int.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 00cafdb..841ec27 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -251,6 +251,21 @@ static void backup_abort(BlockJob *job)
     }
 }
 
+void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+    assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);
+
+    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+        error_setg(errp, "The backup job only supports block checkpoint in"
+                   " sync=none mode");
+        return;
+    }
+
+    hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size  = sizeof(BackupBlockJob),
     .job_type       = BLOCK_JOB_TYPE_BACKUP,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9204d1e..e67b6fb 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -700,6 +700,8 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target,
                   BlockCompletionFunc *cb, void *opaque,
                   BlockJobTxn *txn, Error **errp);
 
+void backup_do_checkpoint(BlockJob *job, Error **errp);
+
 void blk_set_bs(BlockBackend *blk, BlockDriverState *bs);
 
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 4/9] Link backup into block core
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (2 preceding siblings ...)
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 3/9] Backup: clear all bitmap when doing block checkpoint Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 5/9] docs: block replication's description Changlong Xie
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Some programs that add a dependency on it will use
the block layer directly.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
 iscsi.o-libs       := $(LIBISCSI_LIBS)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 5/9] docs: block replication's description
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (3 preceding siblings ...)
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 4/9] Link backup into block core Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 6/9] auto complete active commit Changlong Xie
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 docs/block-replication.txt | 238 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 238 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 0000000..d8e20c7
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,238 @@
+Block replication
+----------------------------------------
+Copyright Fujitsu, Corp. 2016
+Copyright (c) 2016 Intel Corporation
+Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COarse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of the Primary and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort during a vmstate checkpoint, the disk modification operations of
+the Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content (it could be from either "Secondary Write Requests" or
+       previous COW of "Primary Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+         virtio-blk       ||
+             ^            ||                            .----------
+             |            ||                            | Secondary
+        1 Quorum          ||                            '----------
+         /      \         ||
+        /        \        ||
+   Primary    2 filter
+     disk         ^                                                             virtio-blk
+                  |                                                                  ^
+                3 NBD  ------->  3 NBD                                               |
+                client    ||     server                                          2 filter
+                          ||        ^                                                ^
+--------.                 ||        |                                                |
+Primary |                 ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
+--------'                 ||        |          backing        ^       backing
+                          ||        |                         |
+                          ||        |                         |
+                          ||        '-------------------------'
+                          ||           drive-backup sync=none 6
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern for quorum can be extended to
+make the primary always read from the local disk instead of going through
+NBD.
+
+2) The new block filter (the name is replication) will control the block
+replication.
+
+3) The secondary disk receives writes from the primary VM through QEMU's
+embedded NBD server (speculative write-through).
+
+4) The disk on the secondary is represented by a custom block device
+(called active-disk). It should start as an empty disk, and the format
+should support bdrv_make_empty() and backing file.
+
+5) The hidden-disk is created automatically. It buffers the original content
+that is modified by the primary VM. It should also start as an empty disk,
+and the driver supports bdrv_make_empty() and backing file.
+
+6) The drive-backup job (sync=none) is run to allow hidden-disk to buffer
+any state that would otherwise be lost by the speculative write-through
+of the NBD server into the secondary disk. So before block replication,
+the primary disk and secondary disk should contain the same data.
+
+== Failure Handling ==
+There are 7 internal errors when block replication is running:
+1. I/O error on primary disk
+2. Forwarding primary write requests failed
+3. Backup failed
+4. I/O error on secondary disk
+5. I/O error on active disk
+6. Making active disk or hidden disk empty failed
+7. Doing failover failed
+In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
+4 and 6, we just report block replication's error to FT/HA manager (which
+decides when to do a new checkpoint, when to do failover).
+In case 7, if active commit failed, we use replication failover failed state
+in Secondary's write operation (what decides which target to write).
+
+== New block driver interface ==
+We add three block driver interfaces to control block replication:
+a. replication_start_all()
+   Start block replication, called in migration/checkpoint thread.
+   We must call block_replication_start_all() in secondary QEMU before
+   calling block_replication_start_all() in primary QEMU. The caller
+   must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+b. replication_do_checkpoint_all()
+   This interface is called after all VM state is transferred to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+c. replication_get_error_all()
+   This interface is called to check if error happened in replication.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+d. replication_stop_all()
+   It is called on failover. We will flush the Disk buffer into
+   Secondary Disk and stop block replication. The vm should be stopped
+   before calling it if you use this API to shutdown the guest, or other
+   things except failover. The caller must hold the I/O mutex lock if it is
+   in migration/checkpoint thread.
+
+== Usage ==
+Primary:
+  -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw
+
+  Run qmp command in primary qemu:
+    { 'execute': 'human-monitor-command',
+      'arguments': {
+          'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=xxxx,file.port=xxxx,file.export=colo1,node-name=nbd_client1,if=none,id=xxxx'
+      }
+    }
+    { 'execute': 'x-blockdev-change',
+      'arguments': {
+          'parent': 'colo1',
+          'node': 'nbd_client1'
+      }
+    }
+  Note:
+  1. There should be only one NBD Client for each primary disk.
+  2. host is the secondary physical machine's hostname or IP
+  3. Each disk must have its own export name.
+  4. It is all a single argument to -drive and you should ignore the
+     leading whitespace.
+  5. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+
+Secondary:
+  -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
+  -drive if=xxx,driver=replication,mode=secondary,\
+         file.file.filename=active_disk.qcow2,\
+         file.driver=qcow2,\
+         file.backing.file.filename=hidden_disk.qcow2,\
+         file.backing.driver=qcow2,\
+         file.backing.backing=colo1
+
+  Then run qmp command in secondary qemu:
+    { 'execute': 'nbd-server-start',
+      'arguments': {
+          'addr': {
+              'type': 'inet',
+              'data': {
+                  'host': 'xxx',
+                  'port': 'xxx'
+              }
+          }
+      }
+    }
+    { 'execute': 'nbd-server-add',
+      'arguments': {
+          'device': 'colo1',
+          'writable': true
+      }
+    }
+
+  Note:
+  1. The export name in secondary QEMU command line is the secondary
+     disk's id.
+  2. The export name for the same disk must be the same
+  3. The qmp command nbd-server-start and nbd-server-add must be run
+     before running the qmp command migrate on primary QEMU
+  4. Active disk, hidden disk and nbd target's length should be the
+     same.
+  5. It is better to put active disk and hidden disk in ramdisk.
+  6. It is all a single argument to -drive, and you should ignore
+     the leading whitespace.
+
+After Failover:
+Primary:
+  The secondary host is down, so we should run the following qmp command
+  to remove the nbd child from the quorum:
+  { 'execute': 'x-blockdev-change',
+    'arguments': {
+        'parent': 'colo1',
+        'child': 'children.1'
+    }
+  }
+  { 'execute': 'human-monitor-command',
+    'arguments': {
+        'command-line': 'drive_del xxxx'
+    }
+  }
+  Note: there is no qmp command to remove the blockdev now
+
+Secondary:
+  The primary host is down, so we should do the following thing:
+  { 'execute': 'nbd-server-stop' }
+
+TODO:
+1. Continuous block replication
+2. Shared disk
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 6/9] auto complete active commit
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (4 preceding siblings ...)
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 5/9] docs: block replication's description Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation Changlong Xie
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Auto complete mirror job in background to prevent from
blocking synchronously

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block/mirror.c            | 13 +++++++++----
 blockdev.c                |  2 +-
 include/block/block_int.h |  3 ++-
 qemu-img.c                |  2 +-
 4 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2c0edfa..79ec170 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -718,7 +718,8 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
                              BlockCompletionFunc *cb,
                              void *opaque, Error **errp,
                              const BlockJobDriver *driver,
-                             bool is_none_mode, BlockDriverState *base)
+                             bool is_none_mode, BlockDriverState *base,
+                             bool auto_complete)
 {
     MirrorBlockJob *s;
     BlockDriverState *replaced_bs;
@@ -774,6 +775,9 @@ static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
     s->granularity = granularity;
     s->buf_size = ROUND_UP(buf_size, granularity);
     s->unmap = unmap;
+    if (auto_complete) {
+        s->should_complete = true;
+    }
 
     s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
     if (!s->dirty_bitmap) {
@@ -815,14 +819,15 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     mirror_start_job(bs, target, replaces,
                      speed, granularity, buf_size,
                      on_source_error, on_target_error, unmap, cb, opaque, errp,
-                     &mirror_job_driver, is_none_mode, base);
+                     &mirror_job_driver, is_none_mode, base, false);
 }
 
 void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
                          int64_t speed,
                          BlockdevOnError on_error,
                          BlockCompletionFunc *cb,
-                         void *opaque, Error **errp)
+                         void *opaque, Error **errp,
+                         bool auto_complete)
 {
     int64_t length, base_length;
     int orig_base_flags;
@@ -863,7 +868,7 @@ void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
     bdrv_ref(base);
     mirror_start_job(bs, base, NULL, speed, 0, 0,
                      on_error, on_error, false, cb, opaque, &local_err,
-                     &commit_active_job_driver, false, base);
+                     &commit_active_job_driver, false, base, auto_complete);
     if (local_err) {
         error_propagate(errp, local_err);
         goto error_restore_flags;
diff --git a/blockdev.c b/blockdev.c
index 7c1d6da..fd7dce2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3079,7 +3079,7 @@ void qmp_block_commit(const char *device,
             goto out;
         }
         commit_active_start(bs, base_bs, speed, on_error, block_job_cb,
-                            bs, &local_err);
+                            bs, &local_err, false);
     } else {
         commit_start(bs, base_bs, top_bs, speed, on_error, block_job_cb, bs,
                      has_backing_file ? backing_file : NULL, &local_err);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index e67b6fb..9eb1db6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -638,13 +638,14 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
  * @cb: Completion function for the job.
  * @opaque: Opaque pointer value passed to @cb.
  * @errp: Error object.
+ * @auto_complete: Auto complete the job.
  *
  */
 void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
                          int64_t speed,
                          BlockdevOnError on_error,
                          BlockCompletionFunc *cb,
-                         void *opaque, Error **errp);
+                         void *opaque, Error **errp, bool auto_complete);
 /*
  * mirror_start:
  * @bs: Block device to operate on.
diff --git a/qemu-img.c b/qemu-img.c
index f121980..986cc6d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -756,7 +756,7 @@ static int img_commit(int argc, char **argv)
     };
 
     commit_active_start(bs, base_bs, 0, BLOCKDEV_ON_ERROR_REPORT,
-                        common_block_job_cb, &cbi, &local_err);
+                        common_block_job_cb, &cbi, &local_err, false);
     if (local_err) {
         goto done;
     }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (5 preceding siblings ...)
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 6/9] auto complete active commit Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-15  0:57   ` Hailiang Zhang
  2016-03-04 16:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication Changlong Xie
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 Makefile.objs        |  1 +
 qapi/block-core.json | 13 ++++++++
 replication.c        | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 replication.h        | 53 +++++++++++++++++++++++++++++
 4 files changed, 161 insertions(+)
 create mode 100644 replication.c
 create mode 100644 replication.h

diff --git a/Makefile.objs b/Makefile.objs
index 06b95c7..a8c74b7 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qemu-io-cmds.o
+block-obj-y += replication.o
 
 block-obj-m = block/
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 7e9e8fe..12362b8 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2002,6 +2002,19 @@
             '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.6
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
diff --git a/replication.c b/replication.c
new file mode 100644
index 0000000..e8ac2f0
--- /dev/null
+++ b/replication.c
@@ -0,0 +1,94 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "replication.h"
+
+static QLIST_HEAD(, ReplicationState) replication_states;
+
+ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
+{
+    ReplicationState *rs;
+
+    rs = g_new0(ReplicationState, 1);
+    rs->opaque = opaque;
+    rs->ops = ops;
+    QLIST_INSERT_HEAD(&replication_states, rs, node);
+
+    return rs;
+}
+
+void replication_remove(ReplicationState *rs)
+{
+    QLIST_REMOVE(rs, node);
+    g_free(rs);
+}
+
+/*
+ * The caller of the function *MUST* make sure vm stopped
+ */
+void replication_start_all(ReplicationMode mode, Error **errp)
+{
+    ReplicationState *rs, *next;
+
+    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+        if (rs->ops && rs->ops->start) {
+            rs->ops->start(rs, mode, errp);
+        }
+        if (*errp != NULL) {
+            return;
+        }
+    }
+}
+
+void replication_do_checkpoint_all(Error **errp)
+{
+    ReplicationState *rs, *next;
+
+    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+        if (rs->ops && rs->ops->checkpoint) {
+            rs->ops->checkpoint(rs, errp);
+        }
+        if (*errp != NULL) {
+            return;
+        }
+    }
+}
+
+void replication_get_error_all(Error **errp)
+{
+    ReplicationState *rs, *next;
+
+    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+        if (rs->ops && rs->ops->get_error) {
+            rs->ops->get_error(rs, errp);
+        }
+        if (*errp != NULL) {
+            return;
+        }
+    }
+}
+
+void replication_stop_all(bool failover, Error **errp)
+{
+    ReplicationState *rs, *next;
+
+    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
+        if (rs->ops && rs->ops->stop) {
+            rs->ops->stop(rs, failover, errp);
+        }
+        if (*errp != NULL) {
+            return;
+        }
+    }
+}
diff --git a/replication.h b/replication.h
new file mode 100644
index 0000000..faea649
--- /dev/null
+++ b/replication.h
@@ -0,0 +1,53 @@
+/*
+ * Replication filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REPLICATION_H
+#define REPLICATION_H
+
+#include "sysemu/sysemu.h"
+
+typedef struct ReplicationOps ReplicationOps;
+typedef struct ReplicationState ReplicationState;
+typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error **errp);
+typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
+typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
+typedef void (*GetError)(ReplicationState *rs, Error **errp);
+
+struct ReplicationState {
+    void *opaque;
+    ReplicationOps *ops;
+    QLIST_ENTRY(ReplicationState) node;
+};
+
+struct ReplicationOps{
+    Start start;
+    Checkpoint checkpoint;
+    GetError get_error;
+    Stop stop;
+};
+
+
+ReplicationState *replication_new(void *opaque, ReplicationOps *ops);
+
+void replication_remove(ReplicationState *rs);
+
+void replication_start_all(ReplicationMode mode, Error **errp);
+
+void replication_do_checkpoint_all(Error **errp);
+
+void replication_get_error_all(Error **errp);
+
+void replication_stop_all(bool failover, Error **errp);
+
+#endif /* REPLICATION_H */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (6 preceding siblings ...)
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-03-04 17:39   ` Stefan Hajnoczi
  2016-03-04 17:53   ` Stefan Hajnoczi
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 9/9] support replication driver in blockdev-add Changlong Xie
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block/Makefile.objs |   1 +
 block/replication.c | 594 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 595 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fa05f37..94c1d03 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 0000000..4631667
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,594 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 Intel Corporation
+ * Copyright (c) 2016 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "block/nbd.h"
+#include "block/blockjob.h"
+#include "block/block_int.h"
+#include "replication.h"
+
+typedef struct BDRVReplicationState {
+    ReplicationMode mode;
+    int replication_state;
+    BdrvChild *active_disk;
+    BdrvChild *hidden_disk;
+    BdrvChild *secondary_disk;
+    BlockDriverState *top_bs;
+    ReplicationState *rs;
+    Error *blocker;
+    int orig_hidden_flags;
+    int orig_secondary_flags;
+    int error;
+} BDRVReplicationState;
+
+enum {
+    BLOCK_REPLICATION_NONE,             /* block replication is not started */
+    BLOCK_REPLICATION_RUNNING,          /* block replication is running */
+    BLOCK_REPLICATION_FAILOVER,         /* failover is running in background */
+    BLOCK_REPLICATION_FAILOVER_FAILED,  /* failover failed */
+    BLOCK_REPLICATION_DONE,             /* block replication is done(after failover) */
+};
+
+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+                              Error **errp);
+static void replication_do_checkpoint(ReplicationState *rs, Error **errp);
+static void replication_get_error(ReplicationState *rs, Error **errp);
+static void replication_stop(ReplicationState *rs, bool failover,
+                             Error **errp);
+
+#define REPLICATION_MODE        "mode"
+static QemuOptsList replication_runtime_opts = {
+    .name = "replication",
+    .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+    .desc = {
+        {
+            .name = REPLICATION_MODE,
+            .type = QEMU_OPT_STRING,
+        },
+        { /* end of list */ }
+    },
+};
+
+static ReplicationOps replication_ops = {
+    .start = replication_start,
+    .checkpoint = replication_do_checkpoint,
+    .get_error = replication_get_error,
+    .stop = replication_stop,
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+                            int flags, Error **errp)
+{
+    int ret;
+    BDRVReplicationState *s = bs->opaque;
+    Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    const char *mode;
+
+    ret = -EINVAL;
+    opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    mode = qemu_opt_get(opts, REPLICATION_MODE);
+    if (!mode) {
+        error_setg(&local_err, "Missing the option mode");
+        goto fail;
+    }
+
+    if (!strcmp(mode, "primary")) {
+        s->mode = REPLICATION_MODE_PRIMARY;
+    } else if (!strcmp(mode, "secondary")) {
+        s->mode = REPLICATION_MODE_SECONDARY;
+    } else {
+        error_setg(&local_err,
+                   "The option mode's value should be primary or secondary");
+        goto fail;
+    }
+
+    s->rs = replication_new(bs, &replication_ops);
+
+    ret = 0;
+
+fail:
+    qemu_opts_del(opts);
+    error_propagate(errp, local_err);
+
+    return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+        replication_stop(s->rs, false, NULL);
+        replication_remove(s->rs);
+    }
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file->bs);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+    switch (s->replication_state) {
+    case BLOCK_REPLICATION_NONE:
+        return -EIO;
+    case BLOCK_REPLICATION_RUNNING:
+        return 0;
+    case BLOCK_REPLICATION_FAILOVER:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 0;
+    case BLOCK_REPLICATION_FAILOVER_FAILED:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+    case BLOCK_REPLICATION_DONE:
+        /*
+         * active commit job completes, and active disk and secondary_disk
+         * is swapped, so we can operate bs->file directly
+         */
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 0;
+    default:
+        abort();
+    }
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        return ret;
+    }
+
+    if (ret < 0) {
+        s->error = ret;
+        ret = 0;
+    }
+
+    return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int remaining_sectors,
+                                             QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        /* We only use it to forward primary write requests */
+        return -EIO;
+    }
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_co_readv(bs->file->bs, sector_num, remaining_sectors, qiov);
+    return replication_return_value(s, ret);
+}
+
+static coroutine_fn int replication_co_writev(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int remaining_sectors,
+                                              QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    uint64_t bytes_done = 0;
+    BdrvChild *top = bs->file;
+    BdrvChild *base = s->secondary_disk;
+    BlockDriverState *target;
+    int ret, n;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 0) {
+        ret = bdrv_co_writev(top->bs, sector_num,
+                             remaining_sectors, qiov);
+        return replication_return_value(s, ret);
+    }
+
+    /*
+     * Failover failed, only write to active disk if the sectors
+     * have already been allocated in active disk/hidden disk.
+     */
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    while (remaining_sectors > 0) {
+        ret = bdrv_is_allocated_above(top->bs, base->bs, sector_num,
+                                      remaining_sectors, &n);
+        if (ret < 0) {
+            return ret;
+        }
+
+        qemu_iovec_reset(&hd_qiov);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * BDRV_SECTOR_SIZE);
+
+        target = ret ? (top->bs) : (base->bs);
+        ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
+        if (ret < 0) {
+            return ret;
+        }
+
+        remaining_sectors -= n;
+        sector_num += n;
+        bytes_done += n * BDRV_SECTOR_SIZE;
+    }
+
+    return 0;
+}
+
+static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
+                                                    BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
+}
+
+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    if (!s->secondary_disk->bs->job) {
+        error_setg(errp, "Backup job was cancelled unexpectedly");
+        return;
+    }
+
+    backup_do_checkpoint(s->secondary_disk->bs->job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ret = s->active_disk->bs->drv->bdrv_make_empty(s->active_disk->bs);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make active disk empty");
+        return;
+    }
+
+    ret = s->hidden_disk->bs->drv->bdrv_make_empty(s->hidden_disk->bs);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make hidden disk empty");
+        return;
+    }
+}
+
+static void reopen_backing_file(BDRVReplicationState *s, bool writable,
+                                Error **errp)
+{
+    BlockReopenQueue *reopen_queue = NULL;
+    int orig_hidden_flags, orig_secondary_flags;
+    int new_hidden_flags, new_secondary_flags;
+    Error *local_err = NULL;
+
+    if (writable) {
+        orig_hidden_flags = s->orig_hidden_flags =
+                                bdrv_get_flags(s->hidden_disk->bs);
+        new_hidden_flags = (orig_hidden_flags | BDRV_O_RDWR) &
+                                                    ~BDRV_O_INACTIVE;
+        orig_secondary_flags = s->orig_secondary_flags =
+                                bdrv_get_flags(s->secondary_disk->bs);
+        new_secondary_flags = (orig_secondary_flags | BDRV_O_RDWR) &
+                                                     ~BDRV_O_INACTIVE;
+    } else {
+        orig_hidden_flags = (s->orig_hidden_flags | BDRV_O_RDWR) &
+                                                    ~BDRV_O_INACTIVE;
+        new_hidden_flags = s->orig_hidden_flags;
+        orig_secondary_flags = (s->orig_secondary_flags | BDRV_O_RDWR) &
+                                                    ~BDRV_O_INACTIVE;
+        new_secondary_flags = s->orig_secondary_flags;
+    }
+
+    if (orig_hidden_flags != new_hidden_flags) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk->bs, NULL,
+                                         new_hidden_flags);
+    }
+
+    if (!(orig_secondary_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk->bs, NULL,
+                                         new_secondary_flags);
+    }
+
+    if (reopen_queue) {
+        bdrv_reopen_multiple(reopen_queue, &local_err);
+        error_propagate(errp, local_err);
+    }
+}
+
+static void backup_job_cleanup(BDRVReplicationState *s)
+{
+    bdrv_op_unblock_all(s->top_bs, s->blocker);
+    error_free(s->blocker);
+    reopen_backing_file(s, false, NULL);
+}
+
+static void backup_job_completed(void *opaque, int ret)
+{
+    BDRVReplicationState *s = opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_FAILOVER) {
+        /* The backup job is cancelled unexpectedly */
+        s->error = -EIO;
+    }
+
+    backup_job_cleanup(s);
+}
+
+static BlockDriverState *get_top_bs(BlockDriverState *bs)
+{
+    BdrvChild *child;
+
+    while (!bs->blk) {
+        if (QLIST_EMPTY(&bs->parents)) {
+            return NULL;
+        }
+
+        child = QLIST_FIRST(&bs->parents);
+        if (QLIST_NEXT(child, next_parent)) {
+            return NULL;
+        }
+
+        bs = child->parent;
+    }
+
+    return bs;
+}
+
+static void replication_start(ReplicationState *rs, ReplicationMode mode,
+                              Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s = bs->opaque;
+    int64_t active_length, hidden_length, disk_length;
+    AioContext *aio_context;
+    Error *local_err = NULL;
+
+    if (s->replication_state != BLOCK_REPLICATION_NONE) {
+        error_setg(errp, "Block replication is running or done");
+        return;
+    }
+
+    if (s->mode != mode) {
+        error_setg(errp, "The parameter mode's value is invalid, needs %d,"
+                   " but got %d", s->mode, mode);
+        return;
+    }
+
+    aio_context = bdrv_get_aio_context(bs);
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        s->active_disk = bs->file;
+        if (!s->active_disk || !s->active_disk->bs ||
+                                    !s->active_disk->bs->backing) {
+            error_setg(errp, "Active disk doesn't have backing file");
+            return;
+        }
+
+        s->hidden_disk = s->active_disk->bs->backing;
+        if (!s->hidden_disk->bs || !s->hidden_disk->bs->backing) {
+            error_setg(errp, "Hidden disk doesn't have backing file");
+            return;
+        }
+
+        s->secondary_disk = s->hidden_disk->bs->backing;
+        if (!s->secondary_disk->bs || !s->secondary_disk->bs->blk) {
+            error_setg(errp, "The secondary disk doesn't have block backend");
+            return;
+        }
+
+        s->top_bs = get_top_bs(bs);
+        if (!s->top_bs) {
+            error_setg(errp, "Cannot get the top block driver state to do"
+                       " internal backup");
+            return;
+        }
+
+        /* verify the length */
+        active_length = bdrv_getlength(s->active_disk->bs);
+        hidden_length = bdrv_getlength(s->hidden_disk->bs);
+        disk_length = bdrv_getlength(s->secondary_disk->bs);
+        if (active_length < 0 || hidden_length < 0 || disk_length < 0 ||
+            active_length != hidden_length || hidden_length != disk_length) {
+            error_setg(errp, "active disk, hidden disk, secondary disk's length"
+                       " are not the same");
+            return;
+        }
+
+        if (!s->active_disk->bs->drv->bdrv_make_empty ||
+            !s->hidden_disk->bs->drv->bdrv_make_empty) {
+            error_setg(errp,
+                       "active disk or hidden disk doesn't support make_empty");
+            return;
+        }
+
+        /* reopen the backing file in r/w mode */
+        reopen_backing_file(s, true, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        /* start backup job now */
+        error_setg(&s->blocker,
+                   "block device is in use by internal backup job");
+        bdrv_op_block_all(s->top_bs, s->blocker);
+        bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+
+        /*
+         * Must protect backup target if backup job was stopped/cancelled
+         * unexpectedly
+         */
+        bdrv_ref(s->hidden_disk->bs);
+
+        aio_context_acquire(aio_context);
+        backup_start(s->secondary_disk->bs, s->hidden_disk->bs, 0,
+                     MIRROR_SYNC_MODE_NONE, NULL, BLOCKDEV_ON_ERROR_REPORT,
+                     BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
+                     s, NULL, &local_err);
+        aio_context_release(aio_context);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            backup_job_cleanup(s);
+            bdrv_unref(s->hidden_disk->bs);
+            return;
+        }
+        break;
+    default:
+        abort();
+    }
+
+    s->replication_state = BLOCK_REPLICATION_RUNNING;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        aio_context_acquire(aio_context);
+        secondary_do_checkpoint(s, errp);
+        aio_context_release(aio_context);
+    }
+
+    s->error = 0;
+}
+
+static void replication_do_checkpoint(ReplicationState *rs, Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s = bs->opaque;
+    AioContext *aio_context;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        aio_context = bdrv_get_aio_context(bs);
+        aio_context_acquire(aio_context);
+        secondary_do_checkpoint(s, errp);
+        aio_context_release(aio_context);
+    }
+}
+
+static void replication_get_error(ReplicationState *rs, Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    if (s->error) {
+        error_setg(errp, "I/O error occurred");
+        return;
+    }
+}
+
+static void replication_done(void *opaque, int ret)
+{
+    BlockDriverState *bs = opaque;
+    BDRVReplicationState *s = bs->opaque;
+
+    if (ret == 0) {
+        s->replication_state = BLOCK_REPLICATION_DONE;
+
+        /* refresh top bs's filename */
+        bdrv_refresh_filename(bs);
+        s->active_disk = NULL;
+        s->secondary_disk = NULL;
+        s->hidden_disk = NULL;
+        s->error = 0;
+    } else {
+        s->replication_state = BLOCK_REPLICATION_FAILOVER_FAILED;
+        s->error = -EIO;
+    }
+}
+
+static void replication_stop(ReplicationState *rs, bool failover, Error **errp)
+{
+    BlockDriverState *bs = rs->opaque;
+    BDRVReplicationState *s = bs->opaque;
+    AioContext *aio_context;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        s->replication_state = BLOCK_REPLICATION_DONE;
+        s->error = 0;
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        aio_context = bdrv_get_aio_context(bs);
+        if (!failover) {
+            /*
+             * This BDS will be closed, and the job should be completed
+             * before the BDS is closed, because we will access hidden
+             * disk, secondary disk in backup_job_completed().
+             */
+            if (s->secondary_disk->bs->job) {
+                block_job_cancel_sync(s->secondary_disk->bs->job);
+            }
+            aio_context_acquire(aio_context);
+            secondary_do_checkpoint(s, errp);
+            aio_context_release(aio_context);
+            s->replication_state = BLOCK_REPLICATION_DONE;
+            return;
+        }
+
+        s->replication_state = BLOCK_REPLICATION_FAILOVER;
+        if (s->secondary_disk->bs->job) {
+            block_job_cancel(s->secondary_disk->bs->job);
+        }
+
+        aio_context_acquire(aio_context);
+        commit_active_start(s->active_disk->bs, s->secondary_disk->bs, 0,
+                            BLOCKDEV_ON_ERROR_REPORT, replication_done,
+                            bs, errp, true);
+        aio_context_release(aio_context);
+        break;
+    default:
+        abort();
+    }
+}
+
+BlockDriver bdrv_replication = {
+    .format_name                = "replication",
+    .protocol_name              = "replication",
+    .instance_size              = sizeof(BDRVReplicationState),
+
+    .bdrv_open                  = replication_open,
+    .bdrv_close                 = replication_close,
+
+    .bdrv_getlength             = replication_getlength,
+    .bdrv_co_readv              = replication_co_readv,
+    .bdrv_co_writev             = replication_co_writev,
+
+    .is_filter                  = true,
+    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
+
+    .has_variable_length        = true,
+};
+
+static void bdrv_replication_init(void)
+{
+    bdrv_register(&bdrv_replication);
+}
+
+block_init(bdrv_replication_init);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v15 9/9] support replication driver in blockdev-add
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (7 preceding siblings ...)
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication Changlong Xie
@ 2016-02-05  4:18 ` Changlong Xie
  2016-02-15  7:02 ` [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-05  4:18 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 qapi/block-core.json | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 12362b8..af723d5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -247,6 +247,7 @@
 #       2.2: 'archipelago' added, 'cow' dropped
 #       2.3: 'host_floppy' deprecated
 #       2.5: 'host_floppy' dropped
+#       2.6: 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1567,6 +1568,7 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @replication: Since 2.6
 #
 # Since: 2.0
 ##
@@ -1574,8 +1576,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
             'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
             'http', 'https', 'null-aio', 'null-co', 'parallels',
-            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-            'vmdk', 'vpc', 'vvfat' ] }
+            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication',
+            'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsBase
@@ -2015,6 +2017,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.6
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
@@ -2051,6 +2066,7 @@
       'quorum':     'BlockdevOptionsQuorum',
       'raw':        'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+      'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
       'tftp':       'BlockdevOptionsFile',
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation Changlong Xie
@ 2016-02-15  0:57   ` Hailiang Zhang
  2016-02-15  1:13     ` Wen Congyang
  2016-02-15  2:25     ` Changlong Xie
  2016-03-04 16:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 2 replies; 26+ messages in thread
From: Hailiang Zhang @ 2016-02-15  0:57 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, peter.huangpeng,
	Dr. David Alan Gilbert

On 2016/2/5 12:18, Changlong Xie wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
> ---
>   Makefile.objs        |  1 +
>   qapi/block-core.json | 13 ++++++++
>   replication.c        | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   replication.h        | 53 +++++++++++++++++++++++++++++
>   4 files changed, 161 insertions(+)
>   create mode 100644 replication.c
>   create mode 100644 replication.h
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 06b95c7..a8c74b7 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
>   block-obj-$(CONFIG_WIN32) += aio-win32.o
>   block-obj-y += block/
>   block-obj-y += qemu-io-cmds.o
> +block-obj-y += replication.o
>
>   block-obj-m = block/
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 7e9e8fe..12362b8 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2002,6 +2002,19 @@
>               '*read-pattern': 'QuorumReadPattern' } }
>
>   ##
> +# @ReplicationMode
> +#
> +# An enumeration of replication modes.
> +#
> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
> +#
> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
> +#
> +# Since: 2.6
> +##
> +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
> +
> +##
>   # @BlockdevOptions
>   #
>   # Options for creating a block device.
> diff --git a/replication.c b/replication.c
> new file mode 100644
> index 0000000..e8ac2f0
> --- /dev/null
> +++ b/replication.c
> @@ -0,0 +1,94 @@
> +/*
> + * Replication filter
> + *
> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + * Copyright (c) 2016 Intel Corporation
> + * Copyright (c) 2016 FUJITSU LIMITED
> + *
> + * Author:
> + *   Wen Congyang <wency@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "replication.h"
> +
> +static QLIST_HEAD(, ReplicationState) replication_states;
> +
> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
> +{
> +    ReplicationState *rs;
> +
> +    rs = g_new0(ReplicationState, 1);
> +    rs->opaque = opaque;
> +    rs->ops = ops;
> +    QLIST_INSERT_HEAD(&replication_states, rs, node);
> +
> +    return rs;
> +}
> +
> +void replication_remove(ReplicationState *rs)
> +{
> +    QLIST_REMOVE(rs, node);
> +    g_free(rs);
> +}
> +
> +/*
> + * The caller of the function *MUST* make sure vm stopped
> + */
> +void replication_start_all(ReplicationMode mode, Error **errp)
> +{

Is this public API is only used for block ?
If yes, I'd like it with a 'block_' prefix.

> +    ReplicationState *rs, *next;
> +
> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
> +        if (rs->ops && rs->ops->start) {
> +            rs->ops->start(rs, mode, errp);
> +        }
> +        if (*errp != NULL) {

This is incorrect, you miss checking if errp is NULL,
if errp is NULL, there will be an error that accessing memory at address 0x0.
Same with other places in this patch.

> +            return;
> +        }
> +    }
> +}
> +
> +void replication_do_checkpoint_all(Error **errp)
> +{
> +    ReplicationState *rs, *next;
> +
> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
> +        if (rs->ops && rs->ops->checkpoint) {
> +            rs->ops->checkpoint(rs, errp);
> +        }
> +        if (*errp != NULL) {
> +            return;

> +        }
> +    }
> +}
> +
> +void replication_get_error_all(Error **errp)
> +{
> +    ReplicationState *rs, *next;
> +
> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
> +        if (rs->ops && rs->ops->get_error) {
> +            rs->ops->get_error(rs, errp);
> +        }
> +        if (*errp != NULL) {

> +            return;
> +        }
> +    }
> +}
> +
> +void replication_stop_all(bool failover, Error **errp)
> +{
> +    ReplicationState *rs, *next;
> +
> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
> +        if (rs->ops && rs->ops->stop) {
> +            rs->ops->stop(rs, failover, errp);
> +        }
> +        if (*errp != NULL) {
> +            return;
> +        }
> +    }
> +}
> diff --git a/replication.h b/replication.h
> new file mode 100644
> index 0000000..faea649
> --- /dev/null
> +++ b/replication.h
> @@ -0,0 +1,53 @@
> +/*
> + * Replication filter
> + *
> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + * Copyright (c) 2016 Intel Corporation
> + * Copyright (c) 2016 FUJITSU LIMITED
> + *
> + * Author:
> + *   Wen Congyang <wency@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef REPLICATION_H
> +#define REPLICATION_H
> +
> +#include "sysemu/sysemu.h"
> +
> +typedef struct ReplicationOps ReplicationOps;
> +typedef struct ReplicationState ReplicationState;
> +typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error **errp);
> +typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
> +typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
> +typedef void (*GetError)(ReplicationState *rs, Error **errp);
> +
> +struct ReplicationState {
> +    void *opaque;
> +    ReplicationOps *ops;
> +    QLIST_ENTRY(ReplicationState) node;
> +};
> +
> +struct ReplicationOps{
> +    Start start;
> +    Checkpoint checkpoint;
> +    GetError get_error;
> +    Stop stop;
> +};
> +
> +
> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops);
> +
> +void replication_remove(ReplicationState *rs);
> +
> +void replication_start_all(ReplicationMode mode, Error **errp);
> +
> +void replication_do_checkpoint_all(Error **errp);
> +
> +void replication_get_error_all(Error **errp);
> +
> +void replication_stop_all(bool failover, Error **errp);
> +
> +#endif /* REPLICATION_H */
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-02-15  0:57   ` Hailiang Zhang
@ 2016-02-15  1:13     ` Wen Congyang
  2016-02-19  8:41       ` Hailiang Zhang
  2016-02-15  2:25     ` Changlong Xie
  1 sibling, 1 reply; 26+ messages in thread
From: Wen Congyang @ 2016-02-15  1:13 UTC (permalink / raw)
  To: Hailiang Zhang, Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, peter.huangpeng,
	Dr. David Alan Gilbert

On 02/15/2016 08:57 AM, Hailiang Zhang wrote:
> On 2016/2/5 12:18, Changlong Xie wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
>> ---
>>   Makefile.objs        |  1 +
>>   qapi/block-core.json | 13 ++++++++
>>   replication.c        | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   replication.h        | 53 +++++++++++++++++++++++++++++
>>   4 files changed, 161 insertions(+)
>>   create mode 100644 replication.c
>>   create mode 100644 replication.h
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 06b95c7..a8c74b7 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
>>   block-obj-$(CONFIG_WIN32) += aio-win32.o
>>   block-obj-y += block/
>>   block-obj-y += qemu-io-cmds.o
>> +block-obj-y += replication.o
>>
>>   block-obj-m = block/
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 7e9e8fe..12362b8 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2002,6 +2002,19 @@
>>               '*read-pattern': 'QuorumReadPattern' } }
>>
>>   ##
>> +# @ReplicationMode
>> +#
>> +# An enumeration of replication modes.
>> +#
>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
>> +#
>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
>> +
>> +##
>>   # @BlockdevOptions
>>   #
>>   # Options for creating a block device.
>> diff --git a/replication.c b/replication.c
>> new file mode 100644
>> index 0000000..e8ac2f0
>> --- /dev/null
>> +++ b/replication.c
>> @@ -0,0 +1,94 @@
>> +/*
>> + * Replication filter
>> + *
>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2016 Intel Corporation
>> + * Copyright (c) 2016 FUJITSU LIMITED
>> + *
>> + * Author:
>> + *   Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "replication.h"
>> +
>> +static QLIST_HEAD(, ReplicationState) replication_states;
>> +
>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
>> +{
>> +    ReplicationState *rs;
>> +
>> +    rs = g_new0(ReplicationState, 1);
>> +    rs->opaque = opaque;
>> +    rs->ops = ops;
>> +    QLIST_INSERT_HEAD(&replication_states, rs, node);
>> +
>> +    return rs;
>> +}
>> +
>> +void replication_remove(ReplicationState *rs)
>> +{
>> +    QLIST_REMOVE(rs, node);
>> +    g_free(rs);
>> +}
>> +
>> +/*
>> + * The caller of the function *MUST* make sure vm stopped
>> + */
>> +void replication_start_all(ReplicationMode mode, Error **errp)
>> +{
> 
> Is this public API is only used for block ?
> If yes, I'd like it with a 'block_' prefix.

No, we hope it can be used for nic too.

Thanks
Wen Congyang

> 
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->start) {
>> +            rs->ops->start(rs, mode, errp);
>> +        }
>> +        if (*errp != NULL) {
> 
> This is incorrect, you miss checking if errp is NULL,
> if errp is NULL, there will be an error that accessing memory at address 0x0.
> Same with other places in this patch.
> 
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>> +void replication_do_checkpoint_all(Error **errp)
>> +{
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->checkpoint) {
>> +            rs->ops->checkpoint(rs, errp);
>> +        }
>> +        if (*errp != NULL) {
>> +            return;
> 
>> +        }
>> +    }
>> +}
>> +
>> +void replication_get_error_all(Error **errp)
>> +{
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->get_error) {
>> +            rs->ops->get_error(rs, errp);
>> +        }
>> +        if (*errp != NULL) {
> 
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>> +void replication_stop_all(bool failover, Error **errp)
>> +{
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->stop) {
>> +            rs->ops->stop(rs, failover, errp);
>> +        }
>> +        if (*errp != NULL) {
>> +            return;
>> +        }
>> +    }
>> +}
>> diff --git a/replication.h b/replication.h
>> new file mode 100644
>> index 0000000..faea649
>> --- /dev/null
>> +++ b/replication.h
>> @@ -0,0 +1,53 @@
>> +/*
>> + * Replication filter
>> + *
>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2016 Intel Corporation
>> + * Copyright (c) 2016 FUJITSU LIMITED
>> + *
>> + * Author:
>> + *   Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef REPLICATION_H
>> +#define REPLICATION_H
>> +
>> +#include "sysemu/sysemu.h"
>> +
>> +typedef struct ReplicationOps ReplicationOps;
>> +typedef struct ReplicationState ReplicationState;
>> +typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error **errp);
>> +typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
>> +typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
>> +typedef void (*GetError)(ReplicationState *rs, Error **errp);
>> +
>> +struct ReplicationState {
>> +    void *opaque;
>> +    ReplicationOps *ops;
>> +    QLIST_ENTRY(ReplicationState) node;
>> +};
>> +
>> +struct ReplicationOps{
>> +    Start start;
>> +    Checkpoint checkpoint;
>> +    GetError get_error;
>> +    Stop stop;
>> +};
>> +
>> +
>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops);
>> +
>> +void replication_remove(ReplicationState *rs);
>> +
>> +void replication_start_all(ReplicationMode mode, Error **errp);
>> +
>> +void replication_do_checkpoint_all(Error **errp);
>> +
>> +void replication_get_error_all(Error **errp);
>> +
>> +void replication_stop_all(bool failover, Error **errp);
>> +
>> +#endif /* REPLICATION_H */
>>
> 
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-02-15  0:57   ` Hailiang Zhang
  2016-02-15  1:13     ` Wen Congyang
@ 2016-02-15  2:25     ` Changlong Xie
  1 sibling, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-15  2:25 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, peter.huangpeng,
	Dr. David Alan Gilbert

On 02/15/2016 08:57 AM, Hailiang Zhang wrote:
> On 2016/2/5 12:18, Changlong Xie wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
>> ---
>>   Makefile.objs        |  1 +
>>   qapi/block-core.json | 13 ++++++++
>>   replication.c        | 94
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   replication.h        | 53 +++++++++++++++++++++++++++++
>>   4 files changed, 161 insertions(+)
>>   create mode 100644 replication.c
>>   create mode 100644 replication.h
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 06b95c7..a8c74b7 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
>>   block-obj-$(CONFIG_WIN32) += aio-win32.o
>>   block-obj-y += block/
>>   block-obj-y += qemu-io-cmds.o
>> +block-obj-y += replication.o
>>
>>   block-obj-m = block/
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 7e9e8fe..12362b8 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -2002,6 +2002,19 @@
>>               '*read-pattern': 'QuorumReadPattern' } }
>>
>>   ##
>> +# @ReplicationMode
>> +#
>> +# An enumeration of replication modes.
>> +#
>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
>> +#
>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>> +#
>> +# Since: 2.6
>> +##
>> +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
>> +
>> +##
>>   # @BlockdevOptions
>>   #
>>   # Options for creating a block device.
>> diff --git a/replication.c b/replication.c
>> new file mode 100644
>> index 0000000..e8ac2f0
>> --- /dev/null
>> +++ b/replication.c
>> @@ -0,0 +1,94 @@
>> +/*
>> + * Replication filter
>> + *
>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2016 Intel Corporation
>> + * Copyright (c) 2016 FUJITSU LIMITED
>> + *
>> + * Author:
>> + *   Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "replication.h"
>> +
>> +static QLIST_HEAD(, ReplicationState) replication_states;
>> +
>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
>> +{
>> +    ReplicationState *rs;
>> +
>> +    rs = g_new0(ReplicationState, 1);
>> +    rs->opaque = opaque;
>> +    rs->ops = ops;
>> +    QLIST_INSERT_HEAD(&replication_states, rs, node);
>> +
>> +    return rs;
>> +}
>> +
>> +void replication_remove(ReplicationState *rs)
>> +{
>> +    QLIST_REMOVE(rs, node);
>> +    g_free(rs);
>> +}
>> +
>> +/*
>> + * The caller of the function *MUST* make sure vm stopped
>> + */
>> +void replication_start_all(ReplicationMode mode, Error **errp)
>> +{
>
> Is this public API is only used for block ?
> If yes, I'd like it with a 'block_' prefix.
>
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->start) {
>> +            rs->ops->start(rs, mode, errp);
>> +        }
>> +        if (*errp != NULL) {
>
> This is incorrect, you miss checking if errp is NULL,
> if errp is NULL, there will be an error that accessing memory at address
> 0x0.
> Same with other places in this patch.

You are right, will fix in next version.

Thanks
	-Xie

>
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>> +void replication_do_checkpoint_all(Error **errp)
>> +{
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->checkpoint) {
>> +            rs->ops->checkpoint(rs, errp);
>> +        }
>> +        if (*errp != NULL) {
>> +            return;
>
>> +        }
>> +    }
>> +}
>> +
>> +void replication_get_error_all(Error **errp)
>> +{
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->get_error) {
>> +            rs->ops->get_error(rs, errp);
>> +        }
>> +        if (*errp != NULL) {
>
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>> +void replication_stop_all(bool failover, Error **errp)
>> +{
>> +    ReplicationState *rs, *next;
>> +
>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>> +        if (rs->ops && rs->ops->stop) {
>> +            rs->ops->stop(rs, failover, errp);
>> +        }
>> +        if (*errp != NULL) {
>> +            return;
>> +        }
>> +    }
>> +}
>> diff --git a/replication.h b/replication.h
>> new file mode 100644
>> index 0000000..faea649
>> --- /dev/null
>> +++ b/replication.h
>> @@ -0,0 +1,53 @@
>> +/*
>> + * Replication filter
>> + *
>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2016 Intel Corporation
>> + * Copyright (c) 2016 FUJITSU LIMITED
>> + *
>> + * Author:
>> + *   Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or
>> later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef REPLICATION_H
>> +#define REPLICATION_H
>> +
>> +#include "sysemu/sysemu.h"
>> +
>> +typedef struct ReplicationOps ReplicationOps;
>> +typedef struct ReplicationState ReplicationState;
>> +typedef void (*Start)(ReplicationState *rs, ReplicationMode mode,
>> Error **errp);
>> +typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
>> +typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
>> +typedef void (*GetError)(ReplicationState *rs, Error **errp);
>> +
>> +struct ReplicationState {
>> +    void *opaque;
>> +    ReplicationOps *ops;
>> +    QLIST_ENTRY(ReplicationState) node;
>> +};
>> +
>> +struct ReplicationOps{
>> +    Start start;
>> +    Checkpoint checkpoint;
>> +    GetError get_error;
>> +    Stop stop;
>> +};
>> +
>> +
>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops);
>> +
>> +void replication_remove(ReplicationState *rs);
>> +
>> +void replication_start_all(ReplicationMode mode, Error **errp);
>> +
>> +void replication_do_checkpoint_all(Error **errp);
>> +
>> +void replication_get_error_all(Error **errp);
>> +
>> +void replication_stop_all(bool failover, Error **errp);
>> +
>> +#endif /* REPLICATION_H */
>>
>
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (8 preceding siblings ...)
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 9/9] support replication driver in blockdev-add Changlong Xie
@ 2016-02-15  7:02 ` Changlong Xie
  2016-02-17  9:32 ` Hailiang Zhang
  2016-02-23  9:10 ` Changlong Xie
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-15  7:02 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei

Hi all

It seems no any review during the long Spring festival. Ping.... :)

Thanks
	-Xie

On 02/05/2016 12:17 PM, Changlong Xie wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
>
> You can get the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
>
> Usage:
> Please refer to docs/block-replication.txt
>
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>
> Patch status:
> 1. Acked patches: none
> 2. Reviewed patches: patch 4
> 3. Updated patches: patch 3, 4, 5, 7, 8
> Note: patch 7 is a new patch that introduces genernal start/stop/checkpoint/
> get_error
> APIs.
>
> You can get the patch here:
> https://github.com/Pating/qemu/tree/changlox/block-replication-v15
>
> You can get the patch with framework here:
> https://github.com/Pating/qemu/tree/changlox/colo_framework_v14
>
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>     are accepted.
>
> Changs Log:
> V15:
> 1. Rebase to the newest codes
> 2. Fix typos and coding style addresed Eric's comments
> 3. Address Stefan's comments
>     1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
>     2) Update the message and description for [PATCH 4/9]
>     3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
>     4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
>     5) Use BdrvChild instead of holding on to BlockDriverState * pointers
> 4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771
> 5. Introduce replication_get_error_all to check replication status
> 6. Remove useless discard interface
> V14:
> 1. Implement auto complete active commit
> 2. Implement active commit block job for replication.c
> 3. Address the comments from Stefan, add replication-specific API and data
>     structure, also remove old block layer APIs
> V13:
> 1. Rebase to the newest codes
> 2. Remove redundant marcos and semicolon in replication.c
> 3. Fix typos in block-replication.txt
> V12:
> 1. Rebase to the newest codes
> 2. Use backing reference to replcace 'allow-write-backing-file'
> V11:
> 1. Reopen the backing file when starting blcok replication if it is not
>     opened in R/W mode
> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>     when opening backing file
> 3. Block the top BDS so there is only one block job for the top BDS and
>     its backing chain.
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>     reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>     if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
>
> Changlong Xie (1):
>    Introduce new APIs to do replication operation
>
> Wen Congyang (8):
>    unblock backup operations in backing file
>    Store parent BDS in BdrvChild
>    Backup: clear all bitmap when doing block checkpoint
>    Link backup into block core
>    docs: block replication's description
>    auto complete active commit
>    Implement new driver for block replication
>    support replication driver in blockdev-add
>
>   Makefile.objs              |   1 +
>   block.c                    |  19 ++
>   block/Makefile.objs        |   3 +-
>   block/backup.c             |  15 ++
>   block/mirror.c             |  13 +-
>   block/replication.c        | 594 +++++++++++++++++++++++++++++++++++++++++++++
>   blockdev.c                 |   2 +-
>   docs/block-replication.txt | 238 ++++++++++++++++++
>   include/block/block_int.h  |   6 +-
>   qapi/block-core.json       |  33 ++-
>   qemu-img.c                 |   2 +-
>   replication.c              |  94 +++++++
>   replication.h              |  53 ++++
>   13 files changed, 1063 insertions(+), 10 deletions(-)
>   create mode 100644 block/replication.c
>   create mode 100644 docs/block-replication.txt
>   create mode 100644 replication.c
>   create mode 100644 replication.h
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (9 preceding siblings ...)
  2016-02-15  7:02 ` [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
@ 2016-02-17  9:32 ` Hailiang Zhang
  2016-02-23  9:10 ` Changlong Xie
  11 siblings, 0 replies; 26+ messages in thread
From: Hailiang Zhang @ 2016-02-17  9:32 UTC (permalink / raw)
  To: Changlong Xie, qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini,
	Kevin Wolf, Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, peter.huangpeng,
	Michael R. Hines, Dr. David Alan Gilbert, Gonglei

ping...

COLO prototype is based on this series, but it seems that this
series didn't got enough reviewing and feedback, we will miss
the train of qemu 2.6 version :(
Since COLO is still a prototype, some problems could be fixed in
later developing, and we hope COLO prototype to be merged as quickly
as possible, could you please speed up the process of reviewing ?

Thanks,
Hailiang

On 2016/2/5 12:17, Changlong Xie wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
>
> You can get the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
>
> Usage:
> Please refer to docs/block-replication.txt
>
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>
> Patch status:
> 1. Acked patches: none
> 2. Reviewed patches: patch 4
> 3. Updated patches: patch 3, 4, 5, 7, 8
> Note: patch 7 is a new patch that introduces genernal start/stop/checkpoint/
> get_error
> APIs.
>
> You can get the patch here:
> https://github.com/Pating/qemu/tree/changlox/block-replication-v15
>
> You can get the patch with framework here:
> https://github.com/Pating/qemu/tree/changlox/colo_framework_v14
>
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>     are accepted.
>
> Changs Log:
> V15:
> 1. Rebase to the newest codes
> 2. Fix typos and coding style addresed Eric's comments
> 3. Address Stefan's comments
>     1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
>     2) Update the message and description for [PATCH 4/9]
>     3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
>     4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
>     5) Use BdrvChild instead of holding on to BlockDriverState * pointers
> 4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771
> 5. Introduce replication_get_error_all to check replication status
> 6. Remove useless discard interface
> V14:
> 1. Implement auto complete active commit
> 2. Implement active commit block job for replication.c
> 3. Address the comments from Stefan, add replication-specific API and data
>     structure, also remove old block layer APIs
> V13:
> 1. Rebase to the newest codes
> 2. Remove redundant marcos and semicolon in replication.c
> 3. Fix typos in block-replication.txt
> V12:
> 1. Rebase to the newest codes
> 2. Use backing reference to replcace 'allow-write-backing-file'
> V11:
> 1. Reopen the backing file when starting blcok replication if it is not
>     opened in R/W mode
> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>     when opening backing file
> 3. Block the top BDS so there is only one block job for the top BDS and
>     its backing chain.
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>     reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>     if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
>
> Changlong Xie (1):
>    Introduce new APIs to do replication operation
>
> Wen Congyang (8):
>    unblock backup operations in backing file
>    Store parent BDS in BdrvChild
>    Backup: clear all bitmap when doing block checkpoint
>    Link backup into block core
>    docs: block replication's description
>    auto complete active commit
>    Implement new driver for block replication
>    support replication driver in blockdev-add
>
>   Makefile.objs              |   1 +
>   block.c                    |  19 ++
>   block/Makefile.objs        |   3 +-
>   block/backup.c             |  15 ++
>   block/mirror.c             |  13 +-
>   block/replication.c        | 594 +++++++++++++++++++++++++++++++++++++++++++++
>   blockdev.c                 |   2 +-
>   docs/block-replication.txt | 238 ++++++++++++++++++
>   include/block/block_int.h  |   6 +-
>   qapi/block-core.json       |  33 ++-
>   qemu-img.c                 |   2 +-
>   replication.c              |  94 +++++++
>   replication.h              |  53 ++++
>   13 files changed, 1063 insertions(+), 10 deletions(-)
>   create mode 100644 block/replication.c
>   create mode 100644 docs/block-replication.txt
>   create mode 100644 replication.c
>   create mode 100644 replication.h
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-02-15  1:13     ` Wen Congyang
@ 2016-02-19  8:41       ` Hailiang Zhang
  2016-02-19  8:55         ` Wen Congyang
  0 siblings, 1 reply; 26+ messages in thread
From: Hailiang Zhang @ 2016-02-19  8:41 UTC (permalink / raw)
  To: Wen Congyang, Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, qemu block, qemu devel, Jiang Yunhong,
	Dong Eddie, peter.huangpeng, Michael R. Hines, Max Reitz,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

Hi,

On 2016/2/15 9:13, Wen Congyang wrote:
> On 02/15/2016 08:57 AM, Hailiang Zhang wrote:
>> On 2016/2/5 12:18, Changlong Xie wrote:
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
>>> ---
>>>    Makefile.objs        |  1 +
>>>    qapi/block-core.json | 13 ++++++++
>>>    replication.c        | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    replication.h        | 53 +++++++++++++++++++++++++++++
>>>    4 files changed, 161 insertions(+)
>>>    create mode 100644 replication.c
>>>    create mode 100644 replication.h
>>>
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 06b95c7..a8c74b7 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
>>>    block-obj-$(CONFIG_WIN32) += aio-win32.o
>>>    block-obj-y += block/
>>>    block-obj-y += qemu-io-cmds.o
>>> +block-obj-y += replication.o
>>>
>>>    block-obj-m = block/
>>>
>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>> index 7e9e8fe..12362b8 100644
>>> --- a/qapi/block-core.json
>>> +++ b/qapi/block-core.json
>>> @@ -2002,6 +2002,19 @@
>>>                '*read-pattern': 'QuorumReadPattern' } }
>>>
>>>    ##
>>> +# @ReplicationMode
>>> +#
>>> +# An enumeration of replication modes.
>>> +#
>>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
>>> +#
>>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
>>> +
>>> +##
>>>    # @BlockdevOptions
>>>    #
>>>    # Options for creating a block device.
>>> diff --git a/replication.c b/replication.c
>>> new file mode 100644
>>> index 0000000..e8ac2f0
>>> --- /dev/null
>>> +++ b/replication.c
>>> @@ -0,0 +1,94 @@
>>> +/*
>>> + * Replication filter
>>> + *
>>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>>> + * Copyright (c) 2016 Intel Corporation
>>> + * Copyright (c) 2016 FUJITSU LIMITED
>>> + *
>>> + * Author:
>>> + *   Wen Congyang <wency@cn.fujitsu.com>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include "replication.h"
>>> +
>>> +static QLIST_HEAD(, ReplicationState) replication_states;
>>> +
>>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
>>> +{
>>> +    ReplicationState *rs;
>>> +
>>> +    rs = g_new0(ReplicationState, 1);
>>> +    rs->opaque = opaque;
>>> +    rs->ops = ops;
>>> +    QLIST_INSERT_HEAD(&replication_states, rs, node);
>>> +
>>> +    return rs;
>>> +}
>>> +
>>> +void replication_remove(ReplicationState *rs)
>>> +{
>>> +    QLIST_REMOVE(rs, node);
>>> +    g_free(rs);
>>> +}
>>> +
>>> +/*
>>> + * The caller of the function *MUST* make sure vm stopped
>>> + */
>>> +void replication_start_all(ReplicationMode mode, Error **errp)
>>> +{
>>
>> Is this public API is only used for block ?
>> If yes, I'd like it with a 'block_' prefix.
>
> No, we hope it can be used for nic too.
>

OK, i got why you designed these APIs, I like this idea that
use the callback/notifier to notify the status of COLO for block/nic.

But let's do something more graceful.
For COLO, we can consider it has four states:
Prepare/start checkpoint(with VM stopped)/finish checkpoint (with VM resumed)/failover.
So here we need four operation functions according to these four states.
I'd like these public APIs with a 'colo_' prefix,

What about colo_replication_prepare()/colo_replication_start_checkpoint()/

colo_replication_finish_checkpoint()/colo_replication_failover() ?

Besides, It's better to rename the replicaton.c, and move it to the 'migration' folder.
Because it seems to be only used by COLO, maybe colo_ops.c or maybe another choice
move all these codes to colo-comm.c. (If we do this, we need to merge colo-comm.c
first as prerequisite).

Thanks,
Hailiang

>
>>
>>> +    ReplicationState *rs, *next;
>>> +
>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>> +        if (rs->ops && rs->ops->start) {
>>> +            rs->ops->start(rs, mode, errp);
>>> +        }
>>> +        if (*errp != NULL) {
>>
>> This is incorrect, you miss checking if errp is NULL,
>> if errp is NULL, there will be an error that accessing memory at address 0x0.
>> Same with other places in this patch.
>>
>>> +            return;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +void replication_do_checkpoint_all(Error **errp)
>>> +{
>>> +    ReplicationState *rs, *next;
>>> +
>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>> +        if (rs->ops && rs->ops->checkpoint) {
>>> +            rs->ops->checkpoint(rs, errp);
>>> +        }
>>> +        if (*errp != NULL) {
>>> +            return;
>>
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +void replication_get_error_all(Error **errp)
>>> +{
>>> +    ReplicationState *rs, *next;
>>> +
>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>> +        if (rs->ops && rs->ops->get_error) {
>>> +            rs->ops->get_error(rs, errp);
>>> +        }
>>> +        if (*errp != NULL) {
>>
>>> +            return;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +void replication_stop_all(bool failover, Error **errp)
>>> +{
>>> +    ReplicationState *rs, *next;
>>> +
>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>> +        if (rs->ops && rs->ops->stop) {
>>> +            rs->ops->stop(rs, failover, errp);
>>> +        }
>>> +        if (*errp != NULL) {
>>> +            return;
>>> +        }
>>> +    }
>>> +}
>>> diff --git a/replication.h b/replication.h
>>> new file mode 100644
>>> index 0000000..faea649
>>> --- /dev/null
>>> +++ b/replication.h
>>> @@ -0,0 +1,53 @@
>>> +/*
>>> + * Replication filter
>>> + *
>>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>>> + * Copyright (c) 2016 Intel Corporation
>>> + * Copyright (c) 2016 FUJITSU LIMITED
>>> + *
>>> + * Author:
>>> + *   Wen Congyang <wency@cn.fujitsu.com>
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#ifndef REPLICATION_H
>>> +#define REPLICATION_H
>>> +
>>> +#include "sysemu/sysemu.h"
>>> +
>>> +typedef struct ReplicationOps ReplicationOps;
>>> +typedef struct ReplicationState ReplicationState;
>>> +typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error **errp);
>>> +typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
>>> +typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
>>> +typedef void (*GetError)(ReplicationState *rs, Error **errp);
>>> +
>>> +struct ReplicationState {
>>> +    void *opaque;
>>> +    ReplicationOps *ops;
>>> +    QLIST_ENTRY(ReplicationState) node;
>>> +};
>>> +
>>> +struct ReplicationOps{
>>> +    Start start;
>>> +    Checkpoint checkpoint;
>>> +    GetError get_error;
>>> +    Stop stop;
>>> +};
>>> +
>>> +
>>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops);
>>> +
>>> +void replication_remove(ReplicationState *rs);
>>> +
>>> +void replication_start_all(ReplicationMode mode, Error **errp);
>>> +
>>> +void replication_do_checkpoint_all(Error **errp);
>>> +
>>> +void replication_get_error_all(Error **errp);
>>> +
>>> +void replication_stop_all(bool failover, Error **errp);
>>> +
>>> +#endif /* REPLICATION_H */
>>>
>>
>>
>>
>>
>> .
>>
>
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-02-19  8:41       ` Hailiang Zhang
@ 2016-02-19  8:55         ` Wen Congyang
  0 siblings, 0 replies; 26+ messages in thread
From: Wen Congyang @ 2016-02-19  8:55 UTC (permalink / raw)
  To: Hailiang Zhang, Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, qemu block, qemu devel, Jiang Yunhong,
	Dong Eddie, peter.huangpeng, Michael R. Hines, Max Reitz,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

On 02/19/2016 04:41 PM, Hailiang Zhang wrote:
> Hi,
> 
> On 2016/2/15 9:13, Wen Congyang wrote:
>> On 02/15/2016 08:57 AM, Hailiang Zhang wrote:
>>> On 2016/2/5 12:18, Changlong Xie wrote:
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>>>> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
>>>> ---
>>>>    Makefile.objs        |  1 +
>>>>    qapi/block-core.json | 13 ++++++++
>>>>    replication.c        | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    replication.h        | 53 +++++++++++++++++++++++++++++
>>>>    4 files changed, 161 insertions(+)
>>>>    create mode 100644 replication.c
>>>>    create mode 100644 replication.h
>>>>
>>>> diff --git a/Makefile.objs b/Makefile.objs
>>>> index 06b95c7..a8c74b7 100644
>>>> --- a/Makefile.objs
>>>> +++ b/Makefile.objs
>>>> @@ -15,6 +15,7 @@ block-obj-$(CONFIG_POSIX) += aio-posix.o
>>>>    block-obj-$(CONFIG_WIN32) += aio-win32.o
>>>>    block-obj-y += block/
>>>>    block-obj-y += qemu-io-cmds.o
>>>> +block-obj-y += replication.o
>>>>
>>>>    block-obj-m = block/
>>>>
>>>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>>>> index 7e9e8fe..12362b8 100644
>>>> --- a/qapi/block-core.json
>>>> +++ b/qapi/block-core.json
>>>> @@ -2002,6 +2002,19 @@
>>>>                '*read-pattern': 'QuorumReadPattern' } }
>>>>
>>>>    ##
>>>> +# @ReplicationMode
>>>> +#
>>>> +# An enumeration of replication modes.
>>>> +#
>>>> +# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
>>>> +#
>>>> +# @secondary: Secondary mode, receive the vm's state from primary QEMU.
>>>> +#
>>>> +# Since: 2.6
>>>> +##
>>>> +{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
>>>> +
>>>> +##
>>>>    # @BlockdevOptions
>>>>    #
>>>>    # Options for creating a block device.
>>>> diff --git a/replication.c b/replication.c
>>>> new file mode 100644
>>>> index 0000000..e8ac2f0
>>>> --- /dev/null
>>>> +++ b/replication.c
>>>> @@ -0,0 +1,94 @@
>>>> +/*
>>>> + * Replication filter
>>>> + *
>>>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>>>> + * Copyright (c) 2016 Intel Corporation
>>>> + * Copyright (c) 2016 FUJITSU LIMITED
>>>> + *
>>>> + * Author:
>>>> + *   Wen Congyang <wency@cn.fujitsu.com>
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>>> + * See the COPYING file in the top-level directory.
>>>> + */
>>>> +
>>>> +#include "replication.h"
>>>> +
>>>> +static QLIST_HEAD(, ReplicationState) replication_states;
>>>> +
>>>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops)
>>>> +{
>>>> +    ReplicationState *rs;
>>>> +
>>>> +    rs = g_new0(ReplicationState, 1);
>>>> +    rs->opaque = opaque;
>>>> +    rs->ops = ops;
>>>> +    QLIST_INSERT_HEAD(&replication_states, rs, node);
>>>> +
>>>> +    return rs;
>>>> +}
>>>> +
>>>> +void replication_remove(ReplicationState *rs)
>>>> +{
>>>> +    QLIST_REMOVE(rs, node);
>>>> +    g_free(rs);
>>>> +}
>>>> +
>>>> +/*
>>>> + * The caller of the function *MUST* make sure vm stopped
>>>> + */
>>>> +void replication_start_all(ReplicationMode mode, Error **errp)
>>>> +{
>>>
>>> Is this public API is only used for block ?
>>> If yes, I'd like it with a 'block_' prefix.
>>
>> No, we hope it can be used for nic too.
>>
> 
> OK, i got why you designed these APIs, I like this idea that
> use the callback/notifier to notify the status of COLO for block/nic.
> 
> But let's do something more graceful.
> For COLO, we can consider it has four states:
> Prepare/start checkpoint(with VM stopped)/finish checkpoint (with VM resumed)/failover.
> So here we need four operation functions according to these four states.

Where is do_checkpoint?

> I'd like these public APIs with a 'colo_' prefix,

I hope this API can also be used by other solution such as microcheckpoint.

> 
> What about colo_replication_prepare()/colo_replication_start_checkpoint()/
> 
> colo_replication_finish_checkpoint()/colo_replication_failover() ?
> 
> Besides, It's better to rename the replicaton.c, and move it to the 'migration' folder.
> Because it seems to be only used by COLO, maybe colo_ops.c or maybe another choice
> move all these codes to colo-comm.c. (If we do this, we need to merge colo-comm.c
> first as prerequisite).

This api is used by block driver, so qemu-nbd, qemu-img also needs it.

Thanks
Wen Congyang

> 
> Thanks,
> Hailiang
> 
>>
>>>
>>>> +    ReplicationState *rs, *next;
>>>> +
>>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>>> +        if (rs->ops && rs->ops->start) {
>>>> +            rs->ops->start(rs, mode, errp);
>>>> +        }
>>>> +        if (*errp != NULL) {
>>>
>>> This is incorrect, you miss checking if errp is NULL,
>>> if errp is NULL, there will be an error that accessing memory at address 0x0.
>>> Same with other places in this patch.
>>>
>>>> +            return;
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +void replication_do_checkpoint_all(Error **errp)
>>>> +{
>>>> +    ReplicationState *rs, *next;
>>>> +
>>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>>> +        if (rs->ops && rs->ops->checkpoint) {
>>>> +            rs->ops->checkpoint(rs, errp);
>>>> +        }
>>>> +        if (*errp != NULL) {
>>>> +            return;
>>>
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +void replication_get_error_all(Error **errp)
>>>> +{
>>>> +    ReplicationState *rs, *next;
>>>> +
>>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>>> +        if (rs->ops && rs->ops->get_error) {
>>>> +            rs->ops->get_error(rs, errp);
>>>> +        }
>>>> +        if (*errp != NULL) {
>>>
>>>> +            return;
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +void replication_stop_all(bool failover, Error **errp)
>>>> +{
>>>> +    ReplicationState *rs, *next;
>>>> +
>>>> +    QLIST_FOREACH_SAFE(rs, &replication_states, node, next) {
>>>> +        if (rs->ops && rs->ops->stop) {
>>>> +            rs->ops->stop(rs, failover, errp);
>>>> +        }
>>>> +        if (*errp != NULL) {
>>>> +            return;
>>>> +        }
>>>> +    }
>>>> +}
>>>> diff --git a/replication.h b/replication.h
>>>> new file mode 100644
>>>> index 0000000..faea649
>>>> --- /dev/null
>>>> +++ b/replication.h
>>>> @@ -0,0 +1,53 @@
>>>> +/*
>>>> + * Replication filter
>>>> + *
>>>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>>>> + * Copyright (c) 2016 Intel Corporation
>>>> + * Copyright (c) 2016 FUJITSU LIMITED
>>>> + *
>>>> + * Author:
>>>> + *   Wen Congyang <wency@cn.fujitsu.com>
>>>> + *
>>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>>> + * See the COPYING file in the top-level directory.
>>>> + */
>>>> +
>>>> +#ifndef REPLICATION_H
>>>> +#define REPLICATION_H
>>>> +
>>>> +#include "sysemu/sysemu.h"
>>>> +
>>>> +typedef struct ReplicationOps ReplicationOps;
>>>> +typedef struct ReplicationState ReplicationState;
>>>> +typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error **errp);
>>>> +typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
>>>> +typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
>>>> +typedef void (*GetError)(ReplicationState *rs, Error **errp);
>>>> +
>>>> +struct ReplicationState {
>>>> +    void *opaque;
>>>> +    ReplicationOps *ops;
>>>> +    QLIST_ENTRY(ReplicationState) node;
>>>> +};
>>>> +
>>>> +struct ReplicationOps{
>>>> +    Start start;
>>>> +    Checkpoint checkpoint;
>>>> +    GetError get_error;
>>>> +    Stop stop;
>>>> +};
>>>> +
>>>> +
>>>> +ReplicationState *replication_new(void *opaque, ReplicationOps *ops);
>>>> +
>>>> +void replication_remove(ReplicationState *rs);
>>>> +
>>>> +void replication_start_all(ReplicationMode mode, Error **errp);
>>>> +
>>>> +void replication_do_checkpoint_all(Error **errp);
>>>> +
>>>> +void replication_get_error_all(Error **errp);
>>>> +
>>>> +void replication_stop_all(bool failover, Error **errp);
>>>> +
>>>> +#endif /* REPLICATION_H */
>>>>
>>>
>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>>
>> .
>>
> 
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints
  2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
                   ` (10 preceding siblings ...)
  2016-02-17  9:32 ` Hailiang Zhang
@ 2016-02-23  9:10 ` Changlong Xie
  11 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-02-23  9:10 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: zhanghailiang, qemu block, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Michael R. Hines, Gonglei

Ping again ...

Thanks
	-Xie

On 02/05/2016 12:17 PM, Changlong Xie wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
>
> You can get the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
>
> Usage:
> Please refer to docs/block-replication.txt
>
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>
> Patch status:
> 1. Acked patches: none
> 2. Reviewed patches: patch 4
> 3. Updated patches: patch 3, 4, 5, 7, 8
> Note: patch 7 is a new patch that introduces genernal start/stop/checkpoint/
> get_error
> APIs.
>
> You can get the patch here:
> https://github.com/Pating/qemu/tree/changlox/block-replication-v15
>
> You can get the patch with framework here:
> https://github.com/Pating/qemu/tree/changlox/colo_framework_v14
>
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>     are accepted.
>
> Changs Log:
> V15:
> 1. Rebase to the newest codes
> 2. Fix typos and coding style addresed Eric's comments
> 3. Address Stefan's comments
>     1) Make backup_do_checkpoint public, drop the changes on BlockJobDriver
>     2) Update the message and description for [PATCH 4/9]
>     3) Make replication_(start/stop/do_checkpoint)_all as global interfaces
>     4) Introduce AioContext lock to protect start/stop/do_checkpoint callbacks
>     5) Use BdrvChild instead of holding on to BlockDriverState * pointers
> 4. Clear BDRV_O_INACTIVE for hidden disk's open_flags since commit 09e0c771
> 5. Introduce replication_get_error_all to check replication status
> 6. Remove useless discard interface
> V14:
> 1. Implement auto complete active commit
> 2. Implement active commit block job for replication.c
> 3. Address the comments from Stefan, add replication-specific API and data
>     structure, also remove old block layer APIs
> V13:
> 1. Rebase to the newest codes
> 2. Remove redundant marcos and semicolon in replication.c
> 3. Fix typos in block-replication.txt
> V12:
> 1. Rebase to the newest codes
> 2. Use backing reference to replcace 'allow-write-backing-file'
> V11:
> 1. Reopen the backing file when starting blcok replication if it is not
>     opened in R/W mode
> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>     when opening backing file
> 3. Block the top BDS so there is only one block job for the top BDS and
>     its backing chain.
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>     reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>     if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
>
> Changlong Xie (1):
>    Introduce new APIs to do replication operation
>
> Wen Congyang (8):
>    unblock backup operations in backing file
>    Store parent BDS in BdrvChild
>    Backup: clear all bitmap when doing block checkpoint
>    Link backup into block core
>    docs: block replication's description
>    auto complete active commit
>    Implement new driver for block replication
>    support replication driver in blockdev-add
>
>   Makefile.objs              |   1 +
>   block.c                    |  19 ++
>   block/Makefile.objs        |   3 +-
>   block/backup.c             |  15 ++
>   block/mirror.c             |  13 +-
>   block/replication.c        | 594 +++++++++++++++++++++++++++++++++++++++++++++
>   blockdev.c                 |   2 +-
>   docs/block-replication.txt | 238 ++++++++++++++++++
>   include/block/block_int.h  |   6 +-
>   qapi/block-core.json       |  33 ++-
>   qemu-img.c                 |   2 +-
>   replication.c              |  94 +++++++
>   replication.h              |  53 ++++
>   13 files changed, 1063 insertions(+), 10 deletions(-)
>   create mode 100644 block/replication.c
>   create mode 100644 docs/block-replication.txt
>   create mode 100644 replication.c
>   create mode 100644 replication.h
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation Changlong Xie
  2016-02-15  0:57   ` Hailiang Zhang
@ 2016-03-04 16:13   ` Stefan Hajnoczi
  2016-03-08  3:08     ` Changlong Xie
  1 sibling, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2016-03-04 16:13 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]

On Fri, Feb 05, 2016 at 12:18:06PM +0800, Changlong Xie wrote:
> diff --git a/replication.h b/replication.h
> new file mode 100644
> index 0000000..faea649
> --- /dev/null
> +++ b/replication.h
> @@ -0,0 +1,53 @@
> +/*
> + * Replication filter
> + *
> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
> + * Copyright (c) 2016 Intel Corporation
> + * Copyright (c) 2016 FUJITSU LIMITED
> + *
> + * Author:
> + *   Wen Congyang <wency@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef REPLICATION_H
> +#define REPLICATION_H
> +
> +#include "sysemu/sysemu.h"
> +
> +typedef struct ReplicationOps ReplicationOps;
> +typedef struct ReplicationState ReplicationState;
> +typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error **errp);
> +typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
> +typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
> +typedef void (*GetError)(ReplicationState *rs, Error **errp);

These typedefs are likely to collide with system headers.  Please prefix
the name.

That said, I don't think Start, Stop, Checkpoint, GetError are
necessary.  Just declare the prototypes in struct ReplicationOps.  No
user passes single function pointers, they always pass ReplicationOps.
Therefore it's not necessary to declare types for these functions at
all.

> +
> +struct ReplicationState {
> +    void *opaque;
> +    ReplicationOps *ops;
> +    QLIST_ENTRY(ReplicationState) node;
> +};
> +
> +struct ReplicationOps{
> +    Start start;
> +    Checkpoint checkpoint;
> +    GetError get_error;
> +    Stop stop;
> +};

Please add doc comments that explain the semantics of these functions.
For examples, see include/qom/object.h.  This documentation should allow
someone implementing a new replication listener to understand the
constraints and the purpose of these functions.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication Changlong Xie
@ 2016-03-04 17:39   ` Stefan Hajnoczi
  2016-03-08  3:30     ` Changlong Xie
  2016-03-09  3:29     ` Changlong Xie
  2016-03-04 17:53   ` Stefan Hajnoczi
  1 sibling, 2 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2016-03-04 17:39 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 1968 bytes --]

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
> +static void replication_start(ReplicationState *rs, ReplicationMode mode,
> +                              Error **errp)
> +{
> +    BlockDriverState *bs = rs->opaque;
> +    BDRVReplicationState *s = bs->opaque;
> +    int64_t active_length, hidden_length, disk_length;
> +    AioContext *aio_context;
> +    Error *local_err = NULL;
> +
> +    if (s->replication_state != BLOCK_REPLICATION_NONE) {

Dereferencing bs and s is not thread-safe since the AioContext isn't
acquired here.  Unless you have other locking rules, the default is that
everything in BlockDriverState and bs->opaque must be accessed with the
AioContext acquired.

Please apply this comment to the rest of the patch series, too.
Functions that are not part of BlockDriver are probably called outside
the AioContext and must acquire it before they are allowed to access
anything else in bs.

> +        s->top_bs = get_top_bs(bs);
> +        if (!s->top_bs) {
> +            error_setg(errp, "Cannot get the top block driver state to do"
> +                       " internal backup");
> +            return;
> +        }

Traversing the BDS graph using the parent pointer is not safe - you
cannot stash the parent pointer because it changes if a parent node is
inserted/removed.

I suggest passing the drive name as an argument so that bdrv_lookup_bs()
can be used when installing the op blocker.  Then the BdrvChild.parent
pointer patch and get_top_bs() can be deleted.

Jeff Cody is currently working on a new op blocker system.  Hopefully
this system will allow you to install blockers on bs instead of on the
drive's top BDS.  But let's not worry about that for now and just use
the drive name as an argument.

> +        /*
> +         * Must protect backup target if backup job was stopped/cancelled
> +         * unexpectedly
> +         */
> +        bdrv_ref(s->hidden_disk->bs);

Where is the matching bdrv_unref() call?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication
  2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication Changlong Xie
  2016-03-04 17:39   ` Stefan Hajnoczi
@ 2016-03-04 17:53   ` Stefan Hajnoczi
  2016-03-08  3:31     ` Changlong Xie
  2016-03-09 10:35     ` Changlong Xie
  1 sibling, 2 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2016-03-04 17:53 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    if (!s->secondary_disk->bs->job) {
> +        error_setg(errp, "Backup job was cancelled unexpectedly");
> +        return;
> +    }

Do you have test cases for the replication driver?

This error code path seems like something that should be exercised to
make sure it works.

Basic start/checkpoint/stop and checks that the active/secondary/hidden
disks contain the expected data are also important.

Probably the easiest way to do this would be a file called
tests/test-replication.c that calls start/checkpoint/stop and read/write
functions directly instead of running a guest.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH v15 7/9] Introduce new APIs to do replication operation
  2016-03-04 16:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2016-03-08  3:08     ` Changlong Xie
  0 siblings, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-03-08  3:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert,
	zhanghailiang

On 03/05/2016 12:13 AM, Stefan Hajnoczi wrote:
> On Fri, Feb 05, 2016 at 12:18:06PM +0800, Changlong Xie wrote:
>> diff --git a/replication.h b/replication.h
>> new file mode 100644
>> index 0000000..faea649
>> --- /dev/null
>> +++ b/replication.h
>> @@ -0,0 +1,53 @@
>> +/*
>> + * Replication filter
>> + *
>> + * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
>> + * Copyright (c) 2016 Intel Corporation
>> + * Copyright (c) 2016 FUJITSU LIMITED
>> + *
>> + * Author:
>> + *   Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#ifndef REPLICATION_H
>> +#define REPLICATION_H
>> +
>> +#include "sysemu/sysemu.h"
>> +
>> +typedef struct ReplicationOps ReplicationOps;
>> +typedef struct ReplicationState ReplicationState;
>> +typedef void (*Start)(ReplicationState *rs, ReplicationMode mode, Error **errp);
>> +typedef void (*Stop)(ReplicationState *rs, bool failover, Error **errp);
>> +typedef void (*Checkpoint)(ReplicationState *rs, Error **errp);
>> +typedef void (*GetError)(ReplicationState *rs, Error **errp);
>

Hi Stefan

Thanks for your comments.

> These typedefs are likely to collide with system headers.  Please prefix
> the name.
>
> That said, I don't think Start, Stop, Checkpoint, GetError are
> necessary.  Just declare the prototypes in struct ReplicationOps.  No
> user passes single function pointers, they always pass ReplicationOps.
> Therefore it's not necessary to declare types for these functions at
> all.

Will try that.

>
>> +
>> +struct ReplicationState {
>> +    void *opaque;
>> +    ReplicationOps *ops;
>> +    QLIST_ENTRY(ReplicationState) node;
>> +};
>> +
>> +struct ReplicationOps{
>> +    Start start;
>> +    Checkpoint checkpoint;
>> +    GetError get_error;
>> +    Stop stop;
>> +};
>
> Please add doc comments that explain the semantics of these functions.
> For examples, see include/qom/object.h.  This documentation should allow
> someone implementing a new replication listener to understand the
> constraints and the purpose of these functions.

Surely


Thanks
	-Xie
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication
  2016-03-04 17:39   ` Stefan Hajnoczi
@ 2016-03-08  3:30     ` Changlong Xie
  2016-03-09  3:29     ` Changlong Xie
  1 sibling, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-03-08  3:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

On 03/05/2016 01:39 AM, Stefan Hajnoczi wrote:
> On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
>> +static void replication_start(ReplicationState *rs, ReplicationMode mode,
>> +                              Error **errp)
>> +{
>> +    BlockDriverState *bs = rs->opaque;
>> +    BDRVReplicationState *s = bs->opaque;
>> +    int64_t active_length, hidden_length, disk_length;
>> +    AioContext *aio_context;
>> +    Error *local_err = NULL;
>> +
>> +    if (s->replication_state != BLOCK_REPLICATION_NONE) {
>
> Dereferencing bs and s is not thread-safe since the AioContext isn't
> acquired here.  Unless you have other locking rules, the default is that
> everything in BlockDriverState and bs->opaque must be accessed with the
> AioContext acquired.
>
> Please apply this comment to the rest of the patch series, too.
> Functions that are not part of BlockDriver are probably called outside
> the AioContext and must acquire it before they are allowed to access
> anything else in bs.
>

Will do implement in replication_start/stop/checkpoint/get_error callbacks.

>> +        s->top_bs = get_top_bs(bs);
>> +        if (!s->top_bs) {
>> +            error_setg(errp, "Cannot get the top block driver state to do"
>> +                       " internal backup");
>> +            return;
>> +        }
>
> Traversing the BDS graph using the parent pointer is not safe - you
> cannot stash the parent pointer because it changes if a parent node is
> inserted/removed.
>
> I suggest passing the drive name as an argument so that bdrv_lookup_bs()
> can be used when installing the op blocker.  Then the BdrvChild.parent
> pointer patch and get_top_bs() can be deleted.
>

ok

> Jeff Cody is currently working on a new op blocker system.  Hopefully
> this system will allow you to install blockers on bs instead of on the
> drive's top BDS.  But let's not worry about that for now and just use
> the drive name as an argument.

Good news.

Thanks

>
>> +        /*
>> +         * Must protect backup target if backup job was stopped/cancelled
>> +         * unexpectedly
>> +         */
>> +        bdrv_ref(s->hidden_disk->bs);
>
> Where is the matching bdrv_unref() call?
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication
  2016-03-04 17:53   ` Stefan Hajnoczi
@ 2016-03-08  3:31     ` Changlong Xie
  2016-03-09 10:35     ` Changlong Xie
  1 sibling, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-03-08  3:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

On 03/05/2016 01:53 AM, Stefan Hajnoczi wrote:
> On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
>> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    int ret;
>> +
>> +    if (!s->secondary_disk->bs->job) {
>> +        error_setg(errp, "Backup job was cancelled unexpectedly");
>> +        return;
>> +    }
>
> Do you have test cases for the replication driver?
>
> This error code path seems like something that should be exercised to
> make sure it works.
>
> Basic start/checkpoint/stop and checks that the active/secondary/hidden
> disks contain the expected data are also important.
>
> Probably the easiest way to do this would be a file called
> tests/test-replication.c that calls start/checkpoint/stop and read/write
> functions directly instead of running a guest.

Will do that in next version.

Thanks
	-Xie
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication
  2016-03-04 17:39   ` Stefan Hajnoczi
  2016-03-08  3:30     ` Changlong Xie
@ 2016-03-09  3:29     ` Changlong Xie
  1 sibling, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-03-09  3:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

On 03/05/2016 01:39 AM, Stefan Hajnoczi wrote:
> On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
>> +static void replication_start(ReplicationState *rs, ReplicationMode mode,
>> +                              Error **errp)
>> +{
>> +    BlockDriverState *bs = rs->opaque;
>> +    BDRVReplicationState *s = bs->opaque;
>> +    int64_t active_length, hidden_length, disk_length;
>> +    AioContext *aio_context;
>> +    Error *local_err = NULL;
>> +
>> +    if (s->replication_state != BLOCK_REPLICATION_NONE) {
>
> Dereferencing bs and s is not thread-safe since the AioContext isn't
> acquired here.  Unless you have other locking rules, the default is that
> everything in BlockDriverState and bs->opaque must be accessed with the
> AioContext acquired.
>
> Please apply this comment to the rest of the patch series, too.
> Functions that are not part of BlockDriver are probably called outside
> the AioContext and must acquire it before they are allowed to access
> anything else in bs.
>
>> +        s->top_bs = get_top_bs(bs);
>> +        if (!s->top_bs) {
>> +            error_setg(errp, "Cannot get the top block driver state to do"
>> +                       " internal backup");
>> +            return;
>> +        }
>
> Traversing the BDS graph using the parent pointer is not safe - you
> cannot stash the parent pointer because it changes if a parent node is
> inserted/removed.
>
> I suggest passing the drive name as an argument so that bdrv_lookup_bs()
> can be used when installing the op blocker.  Then the BdrvChild.parent
> pointer patch and get_top_bs() can be deleted.
>
> Jeff Cody is currently working on a new op blocker system.  Hopefully
> this system will allow you to install blockers on bs instead of on the
> drive's top BDS.  But let's not worry about that for now and just use
> the drive name as an argument.
>
>> +        /*
>> +         * Must protect backup target if backup job was stopped/cancelled
>> +         * unexpectedly
>> +         */
>> +        bdrv_ref(s->hidden_disk->bs);
>
> Where is the matching bdrv_unref() call?

Two conditions
1. job is cancelled, so "local_err != 0"

449         if (local_err) {
450             error_propagate(errp, local_err);
451             backup_job_cleanup(s);
452             bdrv_unref(s->hidden_disk->bs);
453             return;
454         }

2. backup completed
	backup_complete
		bdrv_unref(s->target);

If i'm wrong, pls correct me.

Thanks
	-Xie

>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication
  2016-03-04 17:53   ` Stefan Hajnoczi
  2016-03-08  3:31     ` Changlong Xie
@ 2016-03-09 10:35     ` Changlong Xie
  1 sibling, 0 replies; 26+ messages in thread
From: Changlong Xie @ 2016-03-09 10:35 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Dr. David Alan Gilbert

On 03/05/2016 01:53 AM, Stefan Hajnoczi wrote:
> On Fri, Feb 05, 2016 at 12:18:07PM +0800, Changlong Xie wrote:
>> +static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    int ret;
>> +
>> +    if (!s->secondary_disk->bs->job) {
>> +        error_setg(errp, "Backup job was cancelled unexpectedly");
>> +        return;
>> +    }
>
> Do you have test cases for the replication driver?
>
> This error code path seems like something that should be exercised to
> make sure it works.
>
> Basic start/checkpoint/stop and checks that the active/secondary/hidden
> disks contain the expected data are also important.
>
> Probably the easiest way to do this would be a file called
> tests/test-replication.c that calls start/checkpoint/stop and read/write
> functions directly instead of running a guest.
>

Hi Stefan

I'm working on the testcase now, and encount some issues.

Replication is different from other block drivers, we just use it for 
COLO live migration what means "primary" and "secondary" *MUST*
co-work with each other.

(1)
There are two test "prototype":
1 "COLO prototype":
     simulate COLO environment, run two guests at the same time and 
setup nbd on both 'primary'/'secondary' side.

2 "Separate prototype"
     test relevant APIs in separate 'primary'/'secondary' mode, so we
don't need run two guests and setup nbd on both side

I prefer *"Separate prototype"*. Which one do you prefer?

(2)
Since we have replication_(start/stop/do_checkpoint/get_error)_all APIs, 
i know how to test start/checkpoint/stop/get_error callbacks.

But I'm confused about how to test read/write of replication, are there 
some ready-made APIs for qemu test system?

I know, for interpreted languages, we can use:
  $QEMU_IO -s -c "read 0 $size" "$TEST_IMG"
  $QEMU_IO -s -c "write -P 0xa 0 $size" "$TEST_IMG"

Also are some APIs to check disks contents(we need check 
active/secondary/hidden disk contents)?

Thanks
	-Xie

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-03-09 10:33 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-05  4:17 [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 1/9] unblock backup operations in backing file Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 2/9] Store parent BDS in BdrvChild Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 3/9] Backup: clear all bitmap when doing block checkpoint Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 4/9] Link backup into block core Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 5/9] docs: block replication's description Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 6/9] auto complete active commit Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 7/9] Introduce new APIs to do replication operation Changlong Xie
2016-02-15  0:57   ` Hailiang Zhang
2016-02-15  1:13     ` Wen Congyang
2016-02-19  8:41       ` Hailiang Zhang
2016-02-19  8:55         ` Wen Congyang
2016-02-15  2:25     ` Changlong Xie
2016-03-04 16:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-03-08  3:08     ` Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 8/9] Implement new driver for block replication Changlong Xie
2016-03-04 17:39   ` Stefan Hajnoczi
2016-03-08  3:30     ` Changlong Xie
2016-03-09  3:29     ` Changlong Xie
2016-03-04 17:53   ` Stefan Hajnoczi
2016-03-08  3:31     ` Changlong Xie
2016-03-09 10:35     ` Changlong Xie
2016-02-05  4:18 ` [Qemu-devel] [PATCH v15 9/9] support replication driver in blockdev-add Changlong Xie
2016-02-15  7:02 ` [Qemu-devel] [PATCH v15 0/9] Block replication for continuous checkpoints Changlong Xie
2016-02-17  9:32 ` Hailiang Zhang
2016-02-23  9:10 ` Changlong Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).