All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
@ 2015-12-25 10:30 Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file Changlong Xie
                   ` (11 more replies)
  0 siblings, 12 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

You can get the detailed information about block replication from here:
http://wiki.qemu.org/Features/BlockReplication

Usage:
Please refer to docs/block-replication.txt

This patch series is based on the following patch series:
1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html

You can get the patch here:
https://github.com/Pating/qemu/tree/changlox/block-replication-v13

You can get the patch with framework here:
https://github.com/Pating/qemu/tree/changlox/colo_framework_v12

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V13:
1. Rebase to the newest codes
2. Remove redundant marcos and semicolon in replication.c 
3. Fix typos in block-replication.txt
V12:
1. Rebase to the newest codes
2. Use backing reference to replcace 'allow-write-backing-file'
V11:
1. Reopen the backing file when starting blcok replication if it is not
   opened in R/W mode
2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
   when opening backing file
3. Block the top BDS so there is only one block job for the top BDS and
   its backing chain.
V10:
1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
   reference.
2. Address the comments from Eric Blake
V9:
1. Update the error messages
2. Rebase to the newest qemu
3. Split child add/delete support. These patches are sent in another patchset.
V8:
1. Address Alberto Garcia's comments
V7:
1. Implement adding/removing quorum child. Remove the option non-connect.
2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (10):
  unblock backup operations in backing file
  Store parent BDS in BdrvChild
  Backup: clear all bitmap when doing block checkpoint
  Allow creating backup jobs when opening BDS
  docs: block replication's description
  Add new block driver interfaces to control block replication
  quorum: implement block driver interfaces for block replication
  Implement new driver for block replication
  support replication driver in blockdev-add
  Add a new API to start/stop replication, do checkpoint to all BDSes

 block.c                    | 145 ++++++++++++
 block/Makefile.objs        |   3 +-
 block/backup.c             |  14 ++
 block/quorum.c             |  78 +++++++
 block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
 blockjob.c                 |  11 +
 docs/block-replication.txt | 227 +++++++++++++++++++
 include/block/block.h      |   9 +
 include/block/block_int.h  |  15 ++
 include/block/blockjob.h   |  12 +
 qapi/block-core.json       |  33 ++-
 11 files changed, 1089 insertions(+), 3 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
1.9.3

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 02/10] Store parent BDS in BdrvChild Changlong Xie
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index b9e99da..1589c0d 100644
--- a/block.c
+++ b/block.c
@@ -1275,6 +1275,24 @@ void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
     /* Otherwise we won't be able to commit due to check in bdrv_commit */
     bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
                     bs->backing_blocker);
+    /*
+     * We do backup in 3 ways:
+     * 1. drive backup
+     *    The target bs is new opened, and the source is top BDS
+     * 2. blockdev backup
+     *    Both the source and the target are top BDSes.
+     * 3. internal backup(used for block replication)
+     *    Both the source and the target are backing file
+     *
+     * In case 1, and 2, the backing file is neither the source nor
+     * the target.
+     * In case 3, we will block the top BDS, so there is only one block
+     * job for the top BDS and its backing chain.
+     */
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                    bs->backing_blocker);
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
+                    bs->backing_blocker);
 out:
     bdrv_refresh_limits(bs, NULL);
 }
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 02/10] Store parent BDS in BdrvChild
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 03/10] Backup: clear all bitmap when doing block checkpoint Changlong Xie
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

We need to access the parent BDS to get the root BDS.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block.c                   | 1 +
 include/block/block_int.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block.c b/block.c
index 1589c0d..c9c913e 100644
--- a/block.c
+++ b/block.c
@@ -1204,6 +1204,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
     BdrvChild *child = g_new(BdrvChild, 1);
     *child = (BdrvChild) {
         .bs     = child_bs,
+        .parent = parent_bs,
         .name   = g_strdup(child_name),
         .role   = child_role,
     };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index ebe8b1e..19c02b6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -361,6 +361,7 @@ extern const BdrvChildRole child_format;
 
 struct BdrvChild {
     BlockDriverState *bs;
+    BlockDriverState *parent;
     char *name;
     const BdrvChildRole *role;
     QLIST_ENTRY(BdrvChild) next;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 03/10] Backup: clear all bitmap when doing block checkpoint
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 02/10] Store parent BDS in BdrvChild Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 04/10] Allow creating backup jobs when opening BDS Changlong Xie
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/backup.c           | 14 ++++++++++++++
 blockjob.c               | 11 +++++++++++
 include/block/blockjob.h | 12 ++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index 705bb77..0a27d01 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -253,11 +253,25 @@ static void backup_abort(BlockJob *job)
     }
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+        error_setg(errp, "The backup job only supports block checkpoint in"
+                   " sync=none mode");
+        return;
+    }
+
+    hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size  = sizeof(BackupBlockJob),
     .job_type       = BLOCK_JOB_TYPE_BACKUP,
     .set_speed      = backup_set_speed,
     .iostatus_reset = backup_iostatus_reset,
+    .do_checkpoint  = backup_do_checkpoint,
     .commit         = backup_commit,
     .abort          = backup_abort,
 };
diff --git a/blockjob.c b/blockjob.c
index 80adb9d..0c8edfe 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -533,3 +533,14 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
     QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
     block_job_txn_ref(txn);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+    if (!job->driver->do_checkpoint) {
+        error_setg(errp, "The job %s doesn't support block checkpoint",
+                   BlockJobType_lookup[job->driver->job_type]);
+        return;
+    }
+
+    job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d84ccd8..abdba7c 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -70,6 +70,9 @@ typedef struct BlockJobDriver {
      * never both.
      */
     void (*abort)(BlockJob *job);
+
+    /** Optional callback for job types that support checkpoint. */
+    void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -443,4 +446,13 @@ void block_job_txn_unref(BlockJobTxn *txn);
  */
 void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 04/10] Allow creating backup jobs when opening BDS
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (2 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 03/10] Backup: clear all bitmap when doing block checkpoint Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 05/10] docs: block replication's description Changlong Xie
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 58ef2ef..fa05f37 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
 iscsi.o-libs       := $(LIBISCSI_LIBS)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 05/10] docs: block replication's description
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (3 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 04/10] Allow creating backup jobs when opening BDS Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 06/10] Add new block driver interfaces to control block replication Changlong Xie
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 docs/block-replication.txt | 227 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 227 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 0000000..73abb65
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,227 @@
+Block replication
+----------------------------------------
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COarse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content (it could be from either "Secondary Write Requests" or
+       previous COW of "Primary Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+         virtio-blk       ||
+             ^            ||                            .----------
+             |            ||                            | Secondary
+        1 Quorum          ||                            '----------
+         /      \         ||
+        /        \        ||
+   Primary    2 filter
+     disk         ^                                                             virtio-blk
+                  |                                                                  ^
+                3 NBD  ------->  3 NBD                                               |
+                client    ||     server                                          2 filter
+                          ||        ^                                                ^
+--------.                 ||        |                                                |
+Primary |                 ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
+--------'                 ||        |          backing        ^       backing
+                          ||        |                         |
+                          ||        |                         |
+                          ||        '-------------------------'
+                          ||           drive-backup sync=none 6
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern for quorum can be extended to
+make the primary always read from the local disk instead of going through
+NBD.
+
+2) The new block filter (the name is replication) will control the block
+replication.
+
+3) The secondary disk receives writes from the primary VM through QEMU's
+embedded NBD server (speculative write-through).
+
+4) The disk on the secondary is represented by a custom block device
+(called active-disk). It should start as an empty disk, and the format
+should support bdrv_make_empty() and backing file.
+
+5) The hidden-disk is created automatically. It buffers the original content
+that is modified by the primary VM. It should also start as an empty disk,
+and the driver supports bdrv_make_empty() and backing file.
+
+6) The drive-backup job(sync=none) is run to allow hidden-disk to buffer
+any state that would otherwise be lost by the speculative write-through
+of the NBD server into the secondary disk. So before block replication,
+the primary disk and secondary disk should contain the same data.
+
+== Failure Handling ==
+There are 6 internal errors when block replication is running:
+1. I/O error on primary disk
+2. Forwarding primary write requests failed
+3. Backup failed
+4. I/O error on secondary disk
+5. I/O error on active disk
+6. Making active disk or hidden disk empty failed
+In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
+4 and 6, we just report block replication's error to FT/HA manager (which
+decides when to do a new checkpoint, when to do failover).
+There is no internal error when doing failover.
+
+== New block driver interface ==
+We add three block driver interfaces to control block replication:
+a. bdrv_start_replication()
+   Start block replication, called in migration/checkpoint thread.
+   We must call bdrv_start_replication() in secondary QEMU before
+   calling bdrv_start_replication() in primary QEMU. The caller
+   must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+b. bdrv_do_checkpoint()
+   This interface is called after all VM state is transferred to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+c. bdrv_stop_replication()
+   It is called on failover. We will flush the Disk buffer into
+   Secondary Disk and stop block replication. The vm should be stopped
+   before calling it if you use this API to shutdown the guest, or other
+   things except failover. The caller must hold the I/O mutex lock if it is
+   in migration/checkpoint thread.
+
+== Usage ==
+Primary:
+  -drive if=xxx,driver=quorum,read-pattern=fifo,id=colo1,vote-threshold=1,\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw
+
+  Run qmp command in primary qemu:
+    { 'execute': 'human-monitor-command',
+      'arguments': {
+          'command-line': 'drive_add buddy driver=replication,mode=primary,file.driver=nbd,file.host=xxxx,file.port=xxxx,file.export=colo1,node-name=nbd_client1,if=none'
+      }
+    }
+    { 'execute': 'x-blockdev-change',
+      'arguments': {
+          'parent': 'colo1',
+          'node': 'nbd_client1'
+      }
+    }
+  Note:
+  1. There should be only one NBD Client for each primary disk.
+  2. host is the secondary physical machine's hostname or IP
+  3. Each disk must have its own export name.
+  4. It is all a single argument to -drive and you should ignore the
+     leading whitespace.
+  5. The qmp command line must be run after running qmp command line in
+     secondary qemu.
+
+Secondary:
+  -drive if=none,driver=raw,file.filename=1.raw,id=colo1 \
+  -drive if=xxx,driver=replication,mode=secondary,\
+         file.file.filename=active_disk.qcow2,\
+         file.driver=qcow2,\
+         file.backing.file.filename=hidden_disk.qcow2,\
+         file.backing.driver=qcow2,\
+         file.backing.backing=colo1
+
+  Then run qmp command in secondary qemu:
+    { 'execute': 'nbd-server-start',
+      'arguments': {
+          'addr': {
+              'type': 'inet',
+              'data': {
+                  'host': 'xxx',
+                  'port': 'xxx'
+              }
+          }
+      }
+    }
+    { 'execute': 'nbd-server-add',
+      'arguments': {
+          'device': 'colo1',
+          'writable': true
+      }
+    }
+
+  Note:
+  1. The export name in secondary QEMU command line is the secondary
+     disk's id.
+  2. The export name for the same disk must be the same
+  3. The qmp command nbd-server-start and nbd-server-add must be run
+     before running the qmp command migrate on primary QEMU
+  4. Active disk, hidden disk and nbd target's length should be the
+     same.
+  5. It is better to put active disk and hidden disk in ramdisk.
+  6. It is all a single argument to -drive, and you should ignore
+     the leading whitespace.
+
+After Failover:
+Primary:
+  The secondary host is down, so we should run the following qmp command
+  to remove the nbd child from the quorum:
+  { 'execute': 'x-blockdev-change',
+    'arguments': {
+        'parent': 'colo1',
+        'child': 'children.1'
+    }
+  }
+  Note: there is no qmp command to remove the blockdev now
+
+Secondary:
+  The primary host is down, so we should do the following thing:
+  { 'execute': 'nbd-server-stop' }
+
+TODO:
+1. Continuous block replication
+2. Shared disk
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 06/10] Add new block driver interfaces to control block replication
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (4 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 05/10] docs: block replication's description Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 07/10] quorum: implement block driver interfaces for " Changlong Xie
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Luiz Capitulino, Gonglei, Michael Roth,
	zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                   | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  5 +++++
 include/block/block_int.h | 14 ++++++++++++++
 qapi/block-core.json      | 13 +++++++++++++
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index c9c913e..275d8b4 100644
--- a/block.c
+++ b/block.c
@@ -4389,3 +4389,46 @@ void bdrv_del_child(BlockDriverState *parent_bs, BlockDriverState *child_bs,
 
     parent_bs->drv->bdrv_del_child(parent_bs, child_bs, errp);
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_start_replication) {
+        drv->bdrv_start_replication(bs, mode, errp);
+    } else if (bs->file) {
+        bdrv_start_replication(bs->file->bs, mode, errp);
+    } else {
+        error_setg(errp, "The BDS %s doesn't support starting block"
+                   " replication", bs->filename);
+    }
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_do_checkpoint) {
+        drv->bdrv_do_checkpoint(bs, errp);
+    } else if (bs->file) {
+        bdrv_do_checkpoint(bs->file->bs, errp);
+    } else {
+        error_setg(errp, "The BDS %s doesn't support block checkpoint",
+                   bs->filename);
+    }
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_stop_replication) {
+        drv->bdrv_stop_replication(bs, failover, errp);
+    } else if (bs->file) {
+        bdrv_stop_replication(bs->file->bs, failover, errp);
+    } else {
+        error_setg(errp, "The BDS %s doesn't support stopping block"
+                   " replication", bs->filename);
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 6c7e54b..5d47cef 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -587,4 +587,9 @@ void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
 void bdrv_del_child(BlockDriverState *parent, BlockDriverState *child,
                     Error **errp);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 19c02b6..e31f9db 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -308,6 +308,20 @@ struct BlockDriver {
     void (*bdrv_del_child)(BlockDriverState *parent, BlockDriverState *child,
                            Error **errp);
 
+    void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+                                   Error **errp);
+    /* Drop Disk buffer when doing checkpoint. */
+    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+    /*
+     * After failover, we should flush Disk buffer into secondary disk
+     * and stop block replication.
+     *
+     * If the guest is shutdown, we should drop Disk buffer and stop
+     * block representation.
+     */
+    void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+                                  Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index fe63c6d..610da92 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1927,6 +1927,19 @@
             '*read-pattern': 'QuorumReadPattern' } }
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.6
+##
+{ 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 07/10] quorum: implement block driver interfaces for block replication
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (5 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 06/10] Add new block driver interfaces to control block replication Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 08/10] Implement new driver " Changlong Xie
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 block/quorum.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index e73418c..aa8c4dd 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -85,6 +85,8 @@ typedef struct BDRVQuorumState {
     int bsize;
 
     QuorumReadPattern read_pattern;
+
+    int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -949,6 +951,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     s->bsize = s->num_children;
 
     g_free(opened);
+    s->replication_index = -1;
     goto exit;
 
 close_exit:
@@ -1146,6 +1149,77 @@ static void quorum_refresh_filename(BlockDriverState *bs, QDict *options)
     bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                                     Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int count = 0, i, index;
+    Error *local_err = NULL;
+
+    /*
+     * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+     * QEMU becoming primary QEMU.
+     */
+    if (mode != REPLICATION_MODE_PRIMARY) {
+        error_setg(errp, "The replication mode for quorum should be 'primary'");
+        return;
+    }
+
+    if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+        error_setg(errp, "Block replication needs read pattern 'fifo'");
+        return;
+    }
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_start_replication(s->children[i]->bs, mode, &local_err);
+        if (local_err) {
+            error_free(local_err);
+            local_err = NULL;
+        } else {
+            count++;
+            index = i;
+        }
+    }
+
+    if (count == 0) {
+        error_setg(errp, "No child supports block replication");
+    } else if (count > 1) {
+        for (i = 0; i < s->num_children; i++) {
+            bdrv_stop_replication(s->children[i]->bs, false, NULL);
+        }
+        error_setg(errp, "Too many children support block replication");
+    } else {
+        s->replication_index = index;
+    }
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    bdrv_do_checkpoint(s->children[s->replication_index]->bs, errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+                                    Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    bdrv_stop_replication(s->children[s->replication_index]->bs, failover,
+                          errp);
+    s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name                        = "quorum",
     .protocol_name                      = "quorum",
@@ -1172,6 +1246,10 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+    .bdrv_start_replication             = quorum_start_replication,
+    .bdrv_do_checkpoint                 = quorum_do_checkpoint,
+    .bdrv_stop_replication              = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 08/10] Implement new driver for block replication
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (6 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 07/10] quorum: implement block driver interfaces for " Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 09/10] support replication driver in blockdev-add Changlong Xie
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block/Makefile.objs |   1 +
 block/replication.c | 545 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 546 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index fa05f37..94c1d03 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 0000000..6a061c9
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,545 @@
+/*
+ * Replication Block filter
+ *
+ * Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2015 Intel Corporation
+ * Copyright (c) 2015 FUJITSU LIMITED
+ *
+ * Author:
+ *   Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+    ReplicationMode mode;
+    int replication_state;
+    BlockDriverState *active_disk;
+    BlockDriverState *hidden_disk;
+    BlockDriverState *secondary_disk;
+    BlockDriverState *top_bs;
+    Error *blocker;
+    int orig_hidden_flags;
+    int orig_secondary_flags;
+    int error;
+} BDRVReplicationState;
+
+enum {
+    BLOCK_REPLICATION_NONE,     /* block replication is not started */
+    BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+    BLOCK_REPLICATION_DONE,     /* block replication is done(failover) */
+};
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp);
+
+#define REPLICATION_MODE        "mode"
+static QemuOptsList replication_runtime_opts = {
+    .name = "replication",
+    .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+    .desc = {
+        {
+            .name = REPLICATION_MODE,
+            .type = QEMU_OPT_STRING,
+        },
+        { /* end of list */ }
+    },
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+                            int flags, Error **errp)
+{
+    int ret;
+    BDRVReplicationState *s = bs->opaque;
+    Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    const char *mode;
+
+    ret = -EINVAL;
+    opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    mode = qemu_opt_get(opts, REPLICATION_MODE);
+    if (!mode) {
+        error_setg(&local_err, "Missing the option mode");
+        goto fail;
+    }
+
+    if (!strcmp(mode, "primary")) {
+        s->mode = REPLICATION_MODE_PRIMARY;
+    } else if (!strcmp(mode, "secondary")) {
+        s->mode = REPLICATION_MODE_SECONDARY;
+    } else {
+        error_setg(&local_err,
+                   "The option mode's value should be primary or secondary");
+        goto fail;
+    }
+
+    ret = 0;
+
+fail:
+    qemu_opts_del(opts);
+    /* propagate error */
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+        replication_stop(bs, false, NULL);
+    }
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file->bs);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+    switch (s->replication_state) {
+    case BLOCK_REPLICATION_NONE:
+        return -EIO;
+    case BLOCK_REPLICATION_RUNNING:
+        return 0;
+    case BLOCK_REPLICATION_DONE:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+    default:
+        abort();
+    }
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        return ret;
+    }
+
+    if (ret < 0) {
+        s->error = ret;
+        ret = 0;
+    }
+
+    return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int remaining_sectors,
+                                             QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        /* We only use it to forward primary write requests */
+        return -EIO;
+    }
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /*
+     * After failover, because we don't commit active disk/hidden disk
+     * to secondary disk, so we should read from active disk directly.
+     */
+    ret = bdrv_co_readv(bs->file->bs, sector_num, remaining_sectors, qiov);
+    return replication_return_value(s, ret);
+}
+
+static coroutine_fn int replication_co_writev(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int remaining_sectors,
+                                              QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    uint64_t bytes_done = 0;
+    BlockDriverState *top = bs->file->bs;
+    BlockDriverState *base = s->secondary_disk;
+    BlockDriverState *target;
+    int ret, n;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 0) {
+        ret = bdrv_co_writev(bs->file->bs, sector_num,
+                             remaining_sectors, qiov);
+        return replication_return_value(s, ret);
+    }
+
+    /*
+     * Only write to active disk if the sectors have
+     * already been allocated in active disk/hidden disk.
+     */
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    while (remaining_sectors > 0) {
+        ret = bdrv_is_allocated_above(top, base, sector_num,
+                                      remaining_sectors, &n);
+        if (ret < 0) {
+            return ret;
+        }
+
+        qemu_iovec_reset(&hd_qiov);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * 512);
+
+        target = ret ? top : base;
+        ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
+        if (ret < 0) {
+            return ret;
+        }
+
+        remaining_sectors -= n;
+        sector_num += n;
+        bytes_done += n * BDRV_SECTOR_SIZE;
+    }
+
+    return 0;
+}
+
+static coroutine_fn int replication_co_discard(BlockDriverState *bs,
+                                               int64_t sector_num,
+                                               int nb_sectors)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 1) {
+        /* It is secondary qemu and we are after failover */
+        ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    ret = bdrv_co_discard(bs->file->bs, sector_num, nb_sectors);
+    return replication_return_value(s, ret);
+}
+
+static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
+                                                    BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->file->bs, candidate);
+}
+
+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    if (!s->secondary_disk->job) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    block_job_do_checkpoint(s->secondary_disk->job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make active disk empty");
+        return;
+    }
+
+    ret = s->hidden_disk->drv->bdrv_make_empty(s->hidden_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make hidden disk empty");
+        return;
+    }
+}
+
+static void reopen_backing_file(BDRVReplicationState *s, bool writable,
+                                Error **errp)
+{
+    BlockReopenQueue *reopen_queue = NULL;
+    int orig_hidden_flags, orig_secondary_flags;
+    int new_hidden_flags, new_secondary_flags;
+    Error *local_err = NULL;
+
+    if (writable) {
+        orig_hidden_flags = bdrv_get_flags(s->hidden_disk);
+        new_hidden_flags = orig_hidden_flags | BDRV_O_RDWR;
+        orig_secondary_flags = bdrv_get_flags(s->secondary_disk);
+        new_secondary_flags = orig_secondary_flags | BDRV_O_RDWR;
+    } else {
+        orig_hidden_flags = s->orig_hidden_flags | BDRV_O_RDWR;
+        new_hidden_flags = s->orig_hidden_flags;
+        orig_secondary_flags = s->orig_secondary_flags | BDRV_O_RDWR;
+        new_secondary_flags = s->orig_secondary_flags;
+    }
+
+    if (orig_hidden_flags != new_hidden_flags) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, s->hidden_disk, NULL,
+                                         new_hidden_flags);
+    }
+
+    if (!(orig_secondary_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, s->secondary_disk, NULL,
+                                         new_secondary_flags);
+    }
+
+    if (reopen_queue) {
+        bdrv_reopen_multiple(reopen_queue, &local_err);
+        if (local_err != NULL) {
+            error_propagate(errp, local_err);
+        }
+    }
+}
+
+static void backup_job_cleanup(BDRVReplicationState *s)
+{
+    bdrv_op_unblock_all(s->top_bs, s->blocker);
+    error_free(s->blocker);
+    reopen_backing_file(s, false, NULL);
+}
+
+static void backup_job_completed(void *opaque, int ret)
+{
+    BDRVReplicationState *s = opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_DONE) {
+        /* The backup job is cancelled unexpectedly */
+        s->error = -EIO;
+    }
+
+    backup_job_cleanup(s);
+}
+
+static BlockDriverState *get_top_bs(BlockDriverState *bs)
+{
+    BdrvChild *child;
+
+    while (!bs->blk) {
+        if (QLIST_EMPTY(&bs->parents)) {
+            return NULL;
+        }
+
+        child = QLIST_FIRST(&bs->parents);
+        if (QLIST_NEXT(child, next_parent)) {
+            return NULL;
+        }
+
+        bs = child->parent;
+    }
+
+    return bs;
+}
+
+static void replication_start(BlockDriverState *bs, ReplicationMode mode,
+                              Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int64_t active_length, hidden_length, disk_length;
+    AioContext *aio_context;
+    Error *local_err = NULL;
+
+    if (s->replication_state != BLOCK_REPLICATION_NONE) {
+        error_setg(errp, "Block replication is running or done");
+        return;
+    }
+
+    if (s->mode != mode) {
+        error_setg(errp, "The parameter mode's value is invalid, needs %d,"
+                   " but receives %d", s->mode, mode);
+        return;
+    }
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        s->active_disk = bs->file->bs;
+        if (!bs->file->bs->backing) {
+            error_setg(errp, "Active disk doesn't have backing file");
+            return;
+        }
+
+        s->hidden_disk = s->active_disk->backing->bs;
+        if (!s->hidden_disk->backing) {
+            error_setg(errp, "Hidden disk doesn't have backing file");
+            return;
+        }
+
+        s->secondary_disk = s->hidden_disk->backing->bs;
+        if (!s->secondary_disk->blk) {
+            error_setg(errp, "The secondary disk doesn't have block backend");
+            return;
+        }
+
+        s->top_bs = get_top_bs(bs);
+        if (!s->top_bs) {
+            error_setg(errp, "Cannot get the top block driver state to do"
+                       " internal backup");
+            return;
+        }
+
+        /* verify the length */
+        active_length = bdrv_getlength(s->active_disk);
+        hidden_length = bdrv_getlength(s->hidden_disk);
+        disk_length = bdrv_getlength(s->secondary_disk);
+        if (active_length < 0 || hidden_length < 0 || disk_length < 0 ||
+            active_length != hidden_length || hidden_length != disk_length) {
+            error_setg(errp, "active disk, hidden disk, secondary disk's length"
+                       " are not the same");
+            return;
+        }
+
+        if (!s->active_disk->drv->bdrv_make_empty ||
+            !s->hidden_disk->drv->bdrv_make_empty) {
+            error_setg(errp,
+                       "active disk or hidden disk doesn't support make_empty");
+            return;
+        }
+
+        /* reopen the backing file in r/w mode */
+        reopen_backing_file(s, true, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        /* start backup job now */
+        error_setg(&s->blocker,
+                   "block device is in use by internal backup job");
+        bdrv_op_block_all(s->top_bs, s->blocker);
+        bdrv_op_unblock(s->top_bs, BLOCK_OP_TYPE_DATAPLANE, s->blocker);
+        bdrv_ref(s->hidden_disk);
+
+        aio_context = bdrv_get_aio_context(bs);
+        aio_context_acquire(aio_context);
+        backup_start(s->secondary_disk, s->hidden_disk, 0,
+                     MIRROR_SYNC_MODE_NONE, NULL, BLOCKDEV_ON_ERROR_REPORT,
+                     BLOCKDEV_ON_ERROR_REPORT, backup_job_completed,
+                     s, NULL, &local_err);
+        aio_context_release(aio_context);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            backup_job_cleanup(s);
+            bdrv_unref(s->hidden_disk);
+            return;
+        }
+        break;
+    default:
+        abort();
+    }
+
+    s->replication_state = BLOCK_REPLICATION_RUNNING;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+
+    s->error = 0;
+}
+
+static void replication_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    if (s->error) {
+        error_setg(errp, "I/O error occurs");
+        return;
+    }
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+}
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    s->replication_state = BLOCK_REPLICATION_DONE;
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        if (!failover) {
+            /*
+             * This BDS will be closed, and the job should be completed
+             * before the BDS is closed, because we will access hidden
+             * disk, secondary disk in backup_job_completed().
+             */
+            if (s->secondary_disk->job) {
+                block_job_cancel_sync(s->secondary_disk->job);
+            }
+            secondary_do_checkpoint(s, errp);
+            return;
+        }
+
+        if (s->secondary_disk->job) {
+            block_job_cancel(s->secondary_disk->job);
+        }
+        break;
+    default:
+        abort();
+    }
+}
+
+BlockDriver bdrv_replication = {
+    .format_name                = "replication",
+    .protocol_name              = "replication",
+    .instance_size              = sizeof(BDRVReplicationState),
+
+    .bdrv_open                  = replication_open,
+    .bdrv_close                 = replication_close,
+
+    .bdrv_getlength             = replication_getlength,
+    .bdrv_co_readv              = replication_co_readv,
+    .bdrv_co_writev             = replication_co_writev,
+    .bdrv_co_discard            = replication_co_discard,
+
+    .is_filter                  = true,
+    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
+
+    .bdrv_start_replication     = replication_start,
+    .bdrv_do_checkpoint         = replication_do_checkpoint,
+    .bdrv_stop_replication      = replication_stop,
+
+    .has_variable_length        = true,
+};
+
+static void bdrv_replication_init(void)
+{
+    bdrv_register(&bdrv_replication);
+}
+
+block_init(bdrv_replication_init);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 09/10] support replication driver in blockdev-add
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (7 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 08/10] Implement new driver " Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes Changlong Xie
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 qapi/block-core.json | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 610da92..7354c6a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -220,6 +220,7 @@
 #       2.2: 'archipelago' added, 'cow' dropped
 #       2.3: 'host_floppy' deprecated
 #       2.5: 'host_floppy' dropped
+#       2.6: 'replication' added
 #
 # @backing_file: #optional the name of the backing file (for copy-on-write)
 #
@@ -1492,6 +1493,7 @@
 # Drivers that are supported in block device operations.
 #
 # @host_device, @host_cdrom: Since 2.1
+# @replication: Since 2.6
 #
 # Since: 2.0
 ##
@@ -1499,8 +1501,8 @@
   'data': [ 'archipelago', 'blkdebug', 'blkverify', 'bochs', 'cloop',
             'dmg', 'file', 'ftp', 'ftps', 'host_cdrom', 'host_device',
             'http', 'https', 'null-aio', 'null-co', 'parallels',
-            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'tftp', 'vdi', 'vhdx',
-            'vmdk', 'vpc', 'vvfat' ] }
+            'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'replication',
+            'tftp', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
 
 ##
 # @BlockdevOptionsBase
@@ -1940,6 +1942,19 @@
 { 'enum' : 'ReplicationMode', 'data' : [ 'primary', 'secondary' ] }
 
 ##
+# @BlockdevOptionsReplication
+#
+# Driver specific block device options for replication
+#
+# @mode: the replication mode
+#
+# Since: 2.6
+##
+{ 'struct': 'BlockdevOptionsReplication',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'mode': 'ReplicationMode'  } }
+
+##
 # @BlockdevOptions
 #
 # Options for creating a block device.
@@ -1976,6 +1991,7 @@
       'quorum':     'BlockdevOptionsQuorum',
       'raw':        'BlockdevOptionsGenericFormat',
 # TODO rbd: Wait for structured options
+      'replication':'BlockdevOptionsReplication',
 # TODO sheepdog: Wait for structured options
 # TODO ssh: Should take InetSocketAddress for 'host'?
       'tftp':       'BlockdevOptionsFile',
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH v13 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (8 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 09/10] support replication driver in blockdev-add Changlong Xie
@ 2015-12-25 10:30 ` Changlong Xie
  2016-01-22 15:14 ` [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Dr. David Alan Gilbert
  2016-01-27 11:03 ` Dr. David Alan Gilbert
  11 siblings, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2015-12-25 10:30 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini, Kevin Wolf,
	Stefan Hajnoczi
  Cc: qemu block, Jiang Yunhong, Dong Eddie, Dr. David Alan Gilbert,
	Michael R. Hines, Gonglei, zhanghailiang

From: Wen Congyang <wency@cn.fujitsu.com>

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
---
 block.c               | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/block/block.h |  4 +++
 2 files changed, 87 insertions(+)

diff --git a/block.c b/block.c
index 275d8b4..634cc97 100644
--- a/block.c
+++ b/block.c
@@ -4432,3 +4432,86 @@ void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
                    " replication", bs->filename);
     }
 }
+
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp)
+{
+    BlockDriverState *bs = NULL, *temp = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (!QLIST_EMPTY(&bs->parents)) {
+            /* It is not top BDS */
+            continue;
+        }
+
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_start_replication(bs, mode, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            goto fail;
+        }
+    }
+
+    return;
+
+fail:
+    while ((temp = bdrv_next(temp)) && bs != temp) {
+        bdrv_stop_replication(temp, false, NULL);
+    }
+}
+
+void bdrv_do_checkpoint_all(Error **errp)
+{
+    BlockDriverState *bs = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (!QLIST_EMPTY(&bs->parents)) {
+            /* It is not top BDS */
+            continue;
+        }
+
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_do_checkpoint(bs, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+void bdrv_stop_replication_all(bool failover, Error **errp)
+{
+    BlockDriverState *bs = NULL;
+    Error *local_err = NULL;
+
+    while ((bs = bdrv_next(bs))) {
+        if (!QLIST_EMPTY(&bs->parents)) {
+            /* It is not top BDS */
+            continue;
+        }
+
+        if (bdrv_is_read_only(bs) || !bdrv_is_inserted(bs)) {
+            continue;
+        }
+
+        bdrv_stop_replication(bs, failover, &local_err);
+        if (!errp) {
+            /*
+             * The caller doesn't care the result, they just
+             * want to stop all block's replication.
+             */
+            continue;
+        }
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 5d47cef..9c4de14 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -592,4 +592,8 @@ void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
 void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
 void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
 
+void bdrv_start_replication_all(ReplicationMode mode, Error **errp);
+void bdrv_do_checkpoint_all(Error **errp);
+void bdrv_stop_replication_all(bool failover, Error **errp);
+
 #endif
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (9 preceding siblings ...)
  2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes Changlong Xie
@ 2016-01-22 15:14 ` Dr. David Alan Gilbert
  2016-01-25  1:06   ` Wen Congyang
  2016-01-25  1:20   ` Wen Congyang
  2016-01-27 11:03 ` Dr. David Alan Gilbert
  11 siblings, 2 replies; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-22 15:14 UTC (permalink / raw)
  To: Changlong Xie, zhanghailiang, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

Hi,
  I can trigger a segfault if I wire in the block replication together with
a quorum instance; it only triggers with both of them present but,
it looks like the problem is a disagreement about the number of quorum
members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
that is posted in the colo-framework set that I think includes this set
(from https://github.com/coloft/qemu.git).

To trigger:
./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S

(qemu) drive_add 0 if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
(qemu) drive_add 1 if=none,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
(qemu) drive_add 2 if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
(qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9

*** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': free(): invalid pointer: 0x0000555555a8fdf0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7cfe1)[0x7ffff110ffe1]
/lib64/libglib-2.0.so.0(g_free+0xf)[0x7ffff1ecc36f]
/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
Program received signal SIGABRT, Aborted.
0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
(gdb) where
#0  0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
#1  0x00007ffff10c9ce8 in abort () from /lib64/libc.so.6
#2  0x00007ffff1108317 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff110ffe1 in _int_free () from /lib64/libc.so.6
#4  0x00007ffff1ecc36f in g_free () from /lib64/libglib-2.0.so.0
#5  0x00005555559dfdd7 in qemu_iovec_destroy (qiov=0x555557815410) at /root/colo/jan-2016/qemu/util/iov.c:378
#6  0x0000555555989cce in quorum_aio_finalize (acb=0x555557815350) at /root/colo/jan-2016/qemu/block/quorum.c:171
171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
(gdb) list
166	
167	    if (acb->is_read) {
168	        /* on the quorum case acb->child_iter == s->num_children - 1 */
169	        for (i = 0; i <= acb->child_iter; i++) {
170	            qemu_vfree(acb->qcrs[i].buf);
171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
172	        }
173	    }
174	
175	    g_free(acb->qcrs);
(gdb) p acb->child_iter
$1 = 1
(gdb) p i
$3 = 1

#7  0x000055555598afca in quorum_aio_cb (opaque=<optimized out>, ret=-5)
    at /root/colo/jan-2016/qemu/block/quorum.c:302
#8  0x00005555559990ee in bdrv_co_complete (acb=0x555557815410) at /root/colo/jan-2016/qemu/block/io.c:2122
.....

So I guess acb->child_iter is wrong, since we only have one child on that quorum?
and we're trying to do a destroy on the second child.

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-22 15:14 ` [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Dr. David Alan Gilbert
@ 2016-01-25  1:06   ` Wen Congyang
  2016-01-25 12:10     ` Dr. David Alan Gilbert
  2016-01-25  1:20   ` Wen Congyang
  1 sibling, 1 reply; 27+ messages in thread
From: Wen Congyang @ 2016-01-25  1:06 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Changlong Xie, zhanghailiang
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I can trigger a segfault if I wire in the block replication together with
> a quorum instance; it only triggers with both of them present but,
> it looks like the problem is a disagreement about the number of quorum
> members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> that is posted in the colo-framework set that I think includes this set
> (from https://github.com/coloft/qemu.git).
> 
> To trigger:
> ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> 
> (qemu) drive_add 0 if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> (qemu) drive_add 1 if=none,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> (qemu) drive_add 2 if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> 
> *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': free(): invalid pointer: 0x0000555555a8fdf0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7cfe1)[0x7ffff110ffe1]
> /lib64/libglib-2.0.so.0(g_free+0xf)[0x7ffff1ecc36f]
> /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> Program received signal SIGABRT, Aborted.
> 0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> (gdb) where
> #0  0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> #1  0x00007ffff10c9ce8 in abort () from /lib64/libc.so.6
> #2  0x00007ffff1108317 in __libc_message () from /lib64/libc.so.6
> #3  0x00007ffff110ffe1 in _int_free () from /lib64/libc.so.6
> #4  0x00007ffff1ecc36f in g_free () from /lib64/libglib-2.0.so.0
> #5  0x00005555559dfdd7 in qemu_iovec_destroy (qiov=0x555557815410) at /root/colo/jan-2016/qemu/util/iov.c:378
> #6  0x0000555555989cce in quorum_aio_finalize (acb=0x555557815350) at /root/colo/jan-2016/qemu/block/quorum.c:171
> 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> (gdb) list
> 166	
> 167	    if (acb->is_read) {
> 168	        /* on the quorum case acb->child_iter == s->num_children - 1 */
> 169	        for (i = 0; i <= acb->child_iter; i++) {
> 170	            qemu_vfree(acb->qcrs[i].buf);
> 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> 172	        }
> 173	    }
> 174	
> 175	    g_free(acb->qcrs);
> (gdb) p acb->child_iter
> $1 = 1
> (gdb) p i
> $3 = 1

Thanks for your test. Can you give me the following information:
1. acb->ret's value
2. s->num_children

I think it is quorum's bug, and acb->ret is < 0.

Thanks
Wen Congyang

> 
> #7  0x000055555598afca in quorum_aio_cb (opaque=<optimized out>, ret=-5)
>     at /root/colo/jan-2016/qemu/block/quorum.c:302
> #8  0x00005555559990ee in bdrv_co_complete (acb=0x555557815410) at /root/colo/jan-2016/qemu/block/io.c:2122
> .....
> 
> So I guess acb->child_iter is wrong, since we only have one child on that quorum?
> and we're trying to do a destroy on the second child.
> 
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-22 15:14 ` [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Dr. David Alan Gilbert
  2016-01-25  1:06   ` Wen Congyang
@ 2016-01-25  1:20   ` Wen Congyang
  2016-01-25 11:56     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 27+ messages in thread
From: Wen Congyang @ 2016-01-25  1:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Changlong Xie, zhanghailiang
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I can trigger a segfault if I wire in the block replication together with
> a quorum instance; it only triggers with both of them present but,
> it looks like the problem is a disagreement about the number of quorum
> members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> that is posted in the colo-framework set that I think includes this set
> (from https://github.com/coloft/qemu.git).
> 
> To trigger:
> ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> 
> (qemu) drive_add 0 if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> (qemu) drive_add 1 if=none,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> (qemu) drive_add 2 if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> 
> *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': free(): invalid pointer: 0x0000555555a8fdf0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x7cfe1)[0x7ffff110ffe1]
> /lib64/libglib-2.0.so.0(g_free+0xf)[0x7ffff1ecc36f]
> /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> Program received signal SIGABRT, Aborted.
> 0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> (gdb) where
> #0  0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> #1  0x00007ffff10c9ce8 in abort () from /lib64/libc.so.6
> #2  0x00007ffff1108317 in __libc_message () from /lib64/libc.so.6
> #3  0x00007ffff110ffe1 in _int_free () from /lib64/libc.so.6
> #4  0x00007ffff1ecc36f in g_free () from /lib64/libglib-2.0.so.0
> #5  0x00005555559dfdd7 in qemu_iovec_destroy (qiov=0x555557815410) at /root/colo/jan-2016/qemu/util/iov.c:378
> #6  0x0000555555989cce in quorum_aio_finalize (acb=0x555557815350) at /root/colo/jan-2016/qemu/block/quorum.c:171
> 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> (gdb) list
> 166	
> 167	    if (acb->is_read) {
> 168	        /* on the quorum case acb->child_iter == s->num_children - 1 */
> 169	        for (i = 0; i <= acb->child_iter; i++) {
> 170	            qemu_vfree(acb->qcrs[i].buf);
> 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> 172	        }
> 173	    }
> 174	
> 175	    g_free(acb->qcrs);
> (gdb) p acb->child_iter
> $1 = 1
> (gdb) p i
> $3 = 1
> 
> #7  0x000055555598afca in quorum_aio_cb (opaque=<optimized out>, ret=-5)
>     at /root/colo/jan-2016/qemu/block/quorum.c:302
> #8  0x00005555559990ee in bdrv_co_complete (acb=0x555557815410) at /root/colo/jan-2016/qemu/block/io.c:2122
> .....
> 
> So I guess acb->child_iter is wrong, since we only have one child on that quorum?
> and we're trying to do a destroy on the second child.

Can you try the following patch:
>From 3f2c5ec288cd9a36afb392b4bba24029f3e9345a Mon Sep 17 00:00:00 2001
From: Wen Congyang <wency@cn.fujitsu.com>
Date: Mon, 25 Jan 2016 09:18:09 +0800
Subject: [PATCH] quorum: fix segfault when read fails in fifo mode

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 block/quorum.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index a5ae4b8..0965277 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -295,6 +295,9 @@ static void quorum_aio_cb(void *opaque, int ret)
             quorum_copy_qiov(acb->qiov, &acb->qcrs[acb->child_iter].qiov);
         }
         acb->vote_ret = ret;
+        if (ret < 0) {
+            acb->child_iter--;
+        }
         quorum_aio_finalize(acb);
         return;
     }
-- 
2.5.0



> 
> Dave
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 
> .
> 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-25  1:20   ` Wen Congyang
@ 2016-01-25 11:56     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-25 11:56 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Changlong Xie, Fam Zheng, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I can trigger a segfault if I wire in the block replication together with
> > a quorum instance; it only triggers with both of them present but,
> > it looks like the problem is a disagreement about the number of quorum
> > members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> > that is posted in the colo-framework set that I think includes this set
> > (from https://github.com/coloft/qemu.git).
> > 
> > To trigger:
> > ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> > 
> > (qemu) drive_add 0 if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> > (qemu) drive_add 1 if=none,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> > (qemu) drive_add 2 if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> > (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> > 
> > *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': free(): invalid pointer: 0x0000555555a8fdf0 ***
> > ======= Backtrace: =========
> > /lib64/libc.so.6(+0x7cfe1)[0x7ffff110ffe1]
> > /lib64/libglib-2.0.so.0(g_free+0xf)[0x7ffff1ecc36f]
> > /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> > Program received signal SIGABRT, Aborted.
> > 0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> > (gdb) where
> > #0  0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> > #1  0x00007ffff10c9ce8 in abort () from /lib64/libc.so.6
> > #2  0x00007ffff1108317 in __libc_message () from /lib64/libc.so.6
> > #3  0x00007ffff110ffe1 in _int_free () from /lib64/libc.so.6
> > #4  0x00007ffff1ecc36f in g_free () from /lib64/libglib-2.0.so.0
> > #5  0x00005555559dfdd7 in qemu_iovec_destroy (qiov=0x555557815410) at /root/colo/jan-2016/qemu/util/iov.c:378
> > #6  0x0000555555989cce in quorum_aio_finalize (acb=0x555557815350) at /root/colo/jan-2016/qemu/block/quorum.c:171
> > 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> > (gdb) list
> > 166	
> > 167	    if (acb->is_read) {
> > 168	        /* on the quorum case acb->child_iter == s->num_children - 1 */
> > 169	        for (i = 0; i <= acb->child_iter; i++) {
> > 170	            qemu_vfree(acb->qcrs[i].buf);
> > 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> > 172	        }
> > 173	    }
> > 174	
> > 175	    g_free(acb->qcrs);
> > (gdb) p acb->child_iter
> > $1 = 1
> > (gdb) p i
> > $3 = 1
> > 
> > #7  0x000055555598afca in quorum_aio_cb (opaque=<optimized out>, ret=-5)
> >     at /root/colo/jan-2016/qemu/block/quorum.c:302
> > #8  0x00005555559990ee in bdrv_co_complete (acb=0x555557815410) at /root/colo/jan-2016/qemu/block/io.c:2122
> > .....
> > 
> > So I guess acb->child_iter is wrong, since we only have one child on that quorum?
> > and we're trying to do a destroy on the second child.
> 
> Can you try the following patch:
> From 3f2c5ec288cd9a36afb392b4bba24029f3e9345a Mon Sep 17 00:00:00 2001
> From: Wen Congyang <wency@cn.fujitsu.com>
> Date: Mon, 25 Jan 2016 09:18:09 +0800
> Subject: [PATCH] quorum: fix segfault when read fails in fifo mode
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  block/quorum.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index a5ae4b8..0965277 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -295,6 +295,9 @@ static void quorum_aio_cb(void *opaque, int ret)
>              quorum_copy_qiov(acb->qiov, &acb->qcrs[acb->child_iter].qiov);
>          }
>          acb->vote_ret = ret;
> +        if (ret < 0) {
> +            acb->child_iter--;
> +        }
>          quorum_aio_finalize(acb);
>          return;
>      }

Yes, that seems to fix it; thanks.

Dave

> -- 
> 2.5.0
> 
> 
> 
> > 
> > Dave
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-25  1:06   ` Wen Congyang
@ 2016-01-25 12:10     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-25 12:10 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Changlong Xie, Fam Zheng, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Stefan Hajnoczi, Paolo Bonzini

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 01/22/2016 11:14 PM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I can trigger a segfault if I wire in the block replication together with
> > a quorum instance; it only triggers with both of them present but,
> > it looks like the problem is a disagreement about the number of quorum
> > members;  I'm triggering this on the 'colo-v2.4-periodic-mode' branch
> > that is posted in the colo-framework set that I think includes this set
> > (from https://github.com/coloft/qemu.git).
> > 
> > To trigger:
> > ./git/colo/jan-16/try/x86_64-softmmu/qemu-system-x86_64 -nographic -S
> > 
> > (qemu) drive_add 0 if=none,id=colo-disk0,file.filename=/home/localvms/bugzilla.raw,driver=raw,node-name=node0
> > (qemu) drive_add 1 if=none,id=active-disk0,throttling.bps-total=70000000,driver=replication,mode=secondary,file.driver=qcow2,file.file.filename=/run/colo-active-disk.qcow2,file.backing.driver=qcow2,file.backing.file.filename=/run/colo-hidden-disk.qcow2,file.backing.backing=colo-disk0
> > (qemu) drive_add 2 if=none,id=top-quorum,driver=quorum,read-pattern=fifo,vote-threshold=1,children.0=active-disk0
> > (qemu) device_add virtio-blk-pci,drive=top-quorum,addr=9
> > 
> > *** Error in `/root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64': free(): invalid pointer: 0x0000555555a8fdf0 ***
> > ======= Backtrace: =========
> > /lib64/libc.so.6(+0x7cfe1)[0x7ffff110ffe1]
> > /lib64/libglib-2.0.so.0(g_free+0xf)[0x7ffff1ecc36f]
> > /root/colo/jan-2016/./try/x86_64-softmmu/qemu-system-x86_64
> > Program received signal SIGABRT, Aborted.
> > 0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> > (gdb) where
> > #0  0x00007ffff10c85f7 in raise () from /lib64/libc.so.6
> > #1  0x00007ffff10c9ce8 in abort () from /lib64/libc.so.6
> > #2  0x00007ffff1108317 in __libc_message () from /lib64/libc.so.6
> > #3  0x00007ffff110ffe1 in _int_free () from /lib64/libc.so.6
> > #4  0x00007ffff1ecc36f in g_free () from /lib64/libglib-2.0.so.0
> > #5  0x00005555559dfdd7 in qemu_iovec_destroy (qiov=0x555557815410) at /root/colo/jan-2016/qemu/util/iov.c:378
> > #6  0x0000555555989cce in quorum_aio_finalize (acb=0x555557815350) at /root/colo/jan-2016/qemu/block/quorum.c:171
> > 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> > (gdb) list
> > 166	
> > 167	    if (acb->is_read) {
> > 168	        /* on the quorum case acb->child_iter == s->num_children - 1 */
> > 169	        for (i = 0; i <= acb->child_iter; i++) {
> > 170	            qemu_vfree(acb->qcrs[i].buf);
> > 171	            qemu_iovec_destroy(&acb->qcrs[i].qiov);
> > 172	        }
> > 173	    }
> > 174	
> > 175	    g_free(acb->qcrs);
> > (gdb) p acb->child_iter
> > $1 = 1
> > (gdb) p i
> > $3 = 1
> 
> Thanks for your test. Can you give me the following information:
> 1. acb->ret's value

(gdb) p acb->ret
There is no member named ret.
(gdb) p acb->vote_ret
$2 = -5

> 2. s->num_children

(gdb) p ((BDRVQuorumState *)acb->common.bs->opaque)->num_children
$5 = 1

Dave

> 
> I think it is quorum's bug, and acb->ret is < 0.
> 
> Thanks
> Wen Congyang
> 
> > 
> > #7  0x000055555598afca in quorum_aio_cb (opaque=<optimized out>, ret=-5)
> >     at /root/colo/jan-2016/qemu/block/quorum.c:302
> > #8  0x00005555559990ee in bdrv_co_complete (acb=0x555557815410) at /root/colo/jan-2016/qemu/block/io.c:2122
> > .....
> > 
> > So I guess acb->child_iter is wrong, since we only have one child on that quorum?
> > and we're trying to do a destroy on the second child.
> > 
> > Dave
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
                   ` (10 preceding siblings ...)
  2016-01-22 15:14 ` [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Dr. David Alan Gilbert
@ 2016-01-27 11:03 ` Dr. David Alan Gilbert
  2016-01-29  6:52   ` Wen Congyang
  11 siblings, 1 reply; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-27 11:03 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, zhanghailiang

Hi,
  I've got a block error if I kill the secondary.

Start both primary & secondary
kill -9 secondary qemu
x_colo_lost_heartbeat on primary

The guest sees a block error and the ext4 root switches to read-only.

I gdb'd the primary with a breakpoint on quorum_report_bad; see
backtrace below.
(This is based on colo-v2.4-periodic-mode of the framework
code with the block and network proxy merged in; so it could be my
merging but I don't think so ?)


(gdb) where
#0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
    at /root/colo/jan-2016/qemu/block/quorum.c:222
#1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
    at /root/colo/jan-2016/qemu/block/quorum.c:315
#2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
#3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
#4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
#5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
#6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
    user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
#7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
#9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
#10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
#11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
#12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707

(gdb) p s->num_children
$1 = 2
(gdb) p acb->success_count
$2 = 0
(gdb) p acb->is_read
$5 = false

(qemu) info block
colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
    Cache mode:       writeback, direct

Dave

* Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
> Block replication is a very important feature which is used for
> continuous checkpoints(for example: COLO).
> 
> You can get the detailed information about block replication from here:
> http://wiki.qemu.org/Features/BlockReplication
> 
> Usage:
> Please refer to docs/block-replication.txt
> 
> This patch series is based on the following patch series:
> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
> 
> You can get the patch here:
> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
> 
> You can get the patch with framework here:
> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
> 
> TODO:
> 1. Continuous block replication. It will be started after basic functions
>    are accepted.
> 
> Changs Log:
> V13:
> 1. Rebase to the newest codes
> 2. Remove redundant marcos and semicolon in replication.c 
> 3. Fix typos in block-replication.txt
> V12:
> 1. Rebase to the newest codes
> 2. Use backing reference to replcace 'allow-write-backing-file'
> V11:
> 1. Reopen the backing file when starting blcok replication if it is not
>    opened in R/W mode
> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>    when opening backing file
> 3. Block the top BDS so there is only one block job for the top BDS and
>    its backing chain.
> V10:
> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>    reference.
> 2. Address the comments from Eric Blake
> V9:
> 1. Update the error messages
> 2. Rebase to the newest qemu
> 3. Split child add/delete support. These patches are sent in another patchset.
> V8:
> 1. Address Alberto Garcia's comments
> V7:
> 1. Implement adding/removing quorum child. Remove the option non-connect.
> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> V6:
> 1. Rebase to the newest qemu.
> V5:
> 1. Address the comments from Gong Lei
> 2. Speed the failover up. The secondary vm can take over very quickly even
>    if there are too many I/O requests.
> V4:
> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> V3:
> 1: use error_setg() instead of error_set()
> 2. Add a new block job API
> 3. Active disk, hidden disk and nbd target uses the same AioContext
> 4. Add a testcase to test new hbitmap API
> V2:
> 1. Redesign the secondary qemu(use image-fleecing)
> 2. Use Error objects to return error message
> 3. Address the comments from Max Reitz and Eric Blake
> 
> Wen Congyang (10):
>   unblock backup operations in backing file
>   Store parent BDS in BdrvChild
>   Backup: clear all bitmap when doing block checkpoint
>   Allow creating backup jobs when opening BDS
>   docs: block replication's description
>   Add new block driver interfaces to control block replication
>   quorum: implement block driver interfaces for block replication
>   Implement new driver for block replication
>   support replication driver in blockdev-add
>   Add a new API to start/stop replication, do checkpoint to all BDSes
> 
>  block.c                    | 145 ++++++++++++
>  block/Makefile.objs        |   3 +-
>  block/backup.c             |  14 ++
>  block/quorum.c             |  78 +++++++
>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
>  blockjob.c                 |  11 +
>  docs/block-replication.txt | 227 +++++++++++++++++++
>  include/block/block.h      |   9 +
>  include/block/block_int.h  |  15 ++
>  include/block/blockjob.h   |  12 +
>  qapi/block-core.json       |  33 ++-
>  11 files changed, 1089 insertions(+), 3 deletions(-)
>  create mode 100644 block/replication.c
>  create mode 100644 docs/block-replication.txt
> 
> -- 
> 1.9.3
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-27 11:03 ` Dr. David Alan Gilbert
@ 2016-01-29  6:52   ` Wen Congyang
  2016-01-29 10:07     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 27+ messages in thread
From: Wen Congyang @ 2016-01-29  6:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini

On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> Hi,
>   I've got a block error if I kill the secondary.
> 
> Start both primary & secondary
> kill -9 secondary qemu
> x_colo_lost_heartbeat on primary
> 
> The guest sees a block error and the ext4 root switches to read-only.
> 
> I gdb'd the primary with a breakpoint on quorum_report_bad; see
> backtrace below.
> (This is based on colo-v2.4-periodic-mode of the framework
> code with the block and network proxy merged in; so it could be my
> merging but I don't think so ?)
> 
> 
> (gdb) where
> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>     at /root/colo/jan-2016/qemu/block/quorum.c:222
> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
>     at /root/colo/jan-2016/qemu/block/quorum.c:315
> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> 
> (gdb) p s->num_children
> $1 = 2
> (gdb) p acb->success_count
> $2 = 0
> (gdb) p acb->is_read
> $5 = false

Sorry for the late reply.
What it the value of acb->count?

If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
I/O error event.

Thanks
Wen Congyang

> 
> (qemu) info block
> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
>     Cache mode:       writeback, direct
> 
> Dave
> 
> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>> Block replication is a very important feature which is used for
>> continuous checkpoints(for example: COLO).
>>
>> You can get the detailed information about block replication from here:
>> http://wiki.qemu.org/Features/BlockReplication
>>
>> Usage:
>> Please refer to docs/block-replication.txt
>>
>> This patch series is based on the following patch series:
>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>>
>> You can get the patch here:
>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
>>
>> You can get the patch with framework here:
>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
>>
>> TODO:
>> 1. Continuous block replication. It will be started after basic functions
>>    are accepted.
>>
>> Changs Log:
>> V13:
>> 1. Rebase to the newest codes
>> 2. Remove redundant marcos and semicolon in replication.c 
>> 3. Fix typos in block-replication.txt
>> V12:
>> 1. Rebase to the newest codes
>> 2. Use backing reference to replcace 'allow-write-backing-file'
>> V11:
>> 1. Reopen the backing file when starting blcok replication if it is not
>>    opened in R/W mode
>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>    when opening backing file
>> 3. Block the top BDS so there is only one block job for the top BDS and
>>    its backing chain.
>> V10:
>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>    reference.
>> 2. Address the comments from Eric Blake
>> V9:
>> 1. Update the error messages
>> 2. Rebase to the newest qemu
>> 3. Split child add/delete support. These patches are sent in another patchset.
>> V8:
>> 1. Address Alberto Garcia's comments
>> V7:
>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
>> V6:
>> 1. Rebase to the newest qemu.
>> V5:
>> 1. Address the comments from Gong Lei
>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>    if there are too many I/O requests.
>> V4:
>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>> V3:
>> 1: use error_setg() instead of error_set()
>> 2. Add a new block job API
>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>> 4. Add a testcase to test new hbitmap API
>> V2:
>> 1. Redesign the secondary qemu(use image-fleecing)
>> 2. Use Error objects to return error message
>> 3. Address the comments from Max Reitz and Eric Blake
>>
>> Wen Congyang (10):
>>   unblock backup operations in backing file
>>   Store parent BDS in BdrvChild
>>   Backup: clear all bitmap when doing block checkpoint
>>   Allow creating backup jobs when opening BDS
>>   docs: block replication's description
>>   Add new block driver interfaces to control block replication
>>   quorum: implement block driver interfaces for block replication
>>   Implement new driver for block replication
>>   support replication driver in blockdev-add
>>   Add a new API to start/stop replication, do checkpoint to all BDSes
>>
>>  block.c                    | 145 ++++++++++++
>>  block/Makefile.objs        |   3 +-
>>  block/backup.c             |  14 ++
>>  block/quorum.c             |  78 +++++++
>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
>>  blockjob.c                 |  11 +
>>  docs/block-replication.txt | 227 +++++++++++++++++++
>>  include/block/block.h      |   9 +
>>  include/block/block_int.h  |  15 ++
>>  include/block/blockjob.h   |  12 +
>>  qapi/block-core.json       |  33 ++-
>>  11 files changed, 1089 insertions(+), 3 deletions(-)
>>  create mode 100644 block/replication.c
>>  create mode 100644 docs/block-replication.txt
>>
>> -- 
>> 1.9.3
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-29  6:52   ` Wen Congyang
@ 2016-01-29 10:07     ` Dr. David Alan Gilbert
  2016-01-29 10:27       ` Wen Congyang
  0 siblings, 1 reply; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-29 10:07 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Changlong Xie, Fam Zheng, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I've got a block error if I kill the secondary.
> > 
> > Start both primary & secondary
> > kill -9 secondary qemu
> > x_colo_lost_heartbeat on primary
> > 
> > The guest sees a block error and the ext4 root switches to read-only.
> > 
> > I gdb'd the primary with a breakpoint on quorum_report_bad; see
> > backtrace below.
> > (This is based on colo-v2.4-periodic-mode of the framework
> > code with the block and network proxy merged in; so it could be my
> > merging but I don't think so ?)
> > 
> > 
> > (gdb) where
> > #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> >     at /root/colo/jan-2016/qemu/block/quorum.c:222
> > #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
> >     at /root/colo/jan-2016/qemu/block/quorum.c:315
> > #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
> > #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> > #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
> > #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> > #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
> >     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> > #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> > #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> > #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> > #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> > #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> > #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> > 
> > (gdb) p s->num_children
> > $1 = 2
> > (gdb) p acb->success_count
> > $2 = 0
> > (gdb) p acb->is_read
> > $5 = false
> 
> Sorry for the late reply.

No problem.

> What it the value of acb->count?

(gdb) p acb->count
$1 = 1

> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
> I/O error event.

Is that safe?  If the secondary fails, do you always have time to issue the command to
remove the children.1  before the guest sees the error?

Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:

(qemu) x_block_change colo-disk0 -d children.1
(qemu) x_colo_lost_heartbeat 

12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param

#0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
    at /root/colo/jan-2016/qemu/block.c:4426

(gdb) p drv
$1 = (BlockDriver *) 0x5d2a

  it looks like the whole of bs is bogus.

#1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>, 
    errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213

(gdb) p s->replication_index
$3 = 1

I guess quorum_del_child needs to stop replication before it removes the child?
(although it would have to be careful not to block on the dead nbd).

#2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
    at /root/colo/jan-2016/qemu/block.c:4504
#3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
#4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
#5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
#6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
#7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
#8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
    user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
#9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
#11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
#12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
#13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
#14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707

Dave

> Thanks
> Wen Congyang
> 
> > 
> > (qemu) info block
> > colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
> >     Cache mode:       writeback, direct
> > 
> > Dave
> > 
> > * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
> >> Block replication is a very important feature which is used for
> >> continuous checkpoints(for example: COLO).
> >>
> >> You can get the detailed information about block replication from here:
> >> http://wiki.qemu.org/Features/BlockReplication
> >>
> >> Usage:
> >> Please refer to docs/block-replication.txt
> >>
> >> This patch series is based on the following patch series:
> >> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
> >>
> >> You can get the patch here:
> >> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
> >>
> >> You can get the patch with framework here:
> >> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
> >>
> >> TODO:
> >> 1. Continuous block replication. It will be started after basic functions
> >>    are accepted.
> >>
> >> Changs Log:
> >> V13:
> >> 1. Rebase to the newest codes
> >> 2. Remove redundant marcos and semicolon in replication.c 
> >> 3. Fix typos in block-replication.txt
> >> V12:
> >> 1. Rebase to the newest codes
> >> 2. Use backing reference to replcace 'allow-write-backing-file'
> >> V11:
> >> 1. Reopen the backing file when starting blcok replication if it is not
> >>    opened in R/W mode
> >> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
> >>    when opening backing file
> >> 3. Block the top BDS so there is only one block job for the top BDS and
> >>    its backing chain.
> >> V10:
> >> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
> >>    reference.
> >> 2. Address the comments from Eric Blake
> >> V9:
> >> 1. Update the error messages
> >> 2. Rebase to the newest qemu
> >> 3. Split child add/delete support. These patches are sent in another patchset.
> >> V8:
> >> 1. Address Alberto Garcia's comments
> >> V7:
> >> 1. Implement adding/removing quorum child. Remove the option non-connect.
> >> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> >> V6:
> >> 1. Rebase to the newest qemu.
> >> V5:
> >> 1. Address the comments from Gong Lei
> >> 2. Speed the failover up. The secondary vm can take over very quickly even
> >>    if there are too many I/O requests.
> >> V4:
> >> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> >> V3:
> >> 1: use error_setg() instead of error_set()
> >> 2. Add a new block job API
> >> 3. Active disk, hidden disk and nbd target uses the same AioContext
> >> 4. Add a testcase to test new hbitmap API
> >> V2:
> >> 1. Redesign the secondary qemu(use image-fleecing)
> >> 2. Use Error objects to return error message
> >> 3. Address the comments from Max Reitz and Eric Blake
> >>
> >> Wen Congyang (10):
> >>   unblock backup operations in backing file
> >>   Store parent BDS in BdrvChild
> >>   Backup: clear all bitmap when doing block checkpoint
> >>   Allow creating backup jobs when opening BDS
> >>   docs: block replication's description
> >>   Add new block driver interfaces to control block replication
> >>   quorum: implement block driver interfaces for block replication
> >>   Implement new driver for block replication
> >>   support replication driver in blockdev-add
> >>   Add a new API to start/stop replication, do checkpoint to all BDSes
> >>
> >>  block.c                    | 145 ++++++++++++
> >>  block/Makefile.objs        |   3 +-
> >>  block/backup.c             |  14 ++
> >>  block/quorum.c             |  78 +++++++
> >>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
> >>  blockjob.c                 |  11 +
> >>  docs/block-replication.txt | 227 +++++++++++++++++++
> >>  include/block/block.h      |   9 +
> >>  include/block/block_int.h  |  15 ++
> >>  include/block/blockjob.h   |  12 +
> >>  qapi/block-core.json       |  33 ++-
> >>  11 files changed, 1089 insertions(+), 3 deletions(-)
> >>  create mode 100644 block/replication.c
> >>  create mode 100644 docs/block-replication.txt
> >>
> >> -- 
> >> 1.9.3
> >>
> >>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-29 10:07     ` Dr. David Alan Gilbert
@ 2016-01-29 10:27       ` Wen Congyang
  2016-01-29 10:47         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 27+ messages in thread
From: Wen Congyang @ 2016-01-29 10:27 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Changlong Xie, Fam Zheng, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini

On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>> Hi,
>>>   I've got a block error if I kill the secondary.
>>>
>>> Start both primary & secondary
>>> kill -9 secondary qemu
>>> x_colo_lost_heartbeat on primary
>>>
>>> The guest sees a block error and the ext4 root switches to read-only.
>>>
>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>> backtrace below.
>>> (This is based on colo-v2.4-periodic-mode of the framework
>>> code with the block and network proxy merged in; so it could be my
>>> merging but I don't think so ?)
>>>
>>>
>>> (gdb) where
>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>>     at /root/colo/jan-2016/qemu/block/quorum.c:222
>>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
>>>     at /root/colo/jan-2016/qemu/block/quorum.c:315
>>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
>>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
>>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>
>>> (gdb) p s->num_children
>>> $1 = 2
>>> (gdb) p acb->success_count
>>> $2 = 0
>>> (gdb) p acb->is_read
>>> $5 = false
>>
>> Sorry for the late reply.
> 
> No problem.
> 
>> What it the value of acb->count?
> 
> (gdb) p acb->count
> $1 = 1

Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
the guest doesn't know this error.

> 
>> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
>> I/O error event.
> 
> Is that safe?  If the secondary fails, do you always have time to issue the command to
> remove the children.1  before the guest sees the error?

We will write to two children, and expect that writing to children.0 will success. If so,
the guest doesn't know this error. You just get the I/O error event.

> 
> Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
> 
> (qemu) x_block_change colo-disk0 -d children.1
> (qemu) x_colo_lost_heartbeat 

Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.

> 
> 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
> 
> #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
>     at /root/colo/jan-2016/qemu/block.c:4426
> 
> (gdb) p drv
> $1 = (BlockDriver *) 0x5d2a
> 
>   it looks like the whole of bs is bogus.
> 
> #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>, 
>     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
> 
> (gdb) p s->replication_index
> $3 = 1
> 
> I guess quorum_del_child needs to stop replication before it removes the child?

Yes, but in the newest version, quorum doesn't know the block replication, and I think
we shoud add an reference to the bs when starting block replication.

Thanks
Wen Congyang

> (although it would have to be careful not to block on the dead nbd).
> 
> #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
>     at /root/colo/jan-2016/qemu/block.c:4504
> #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
> #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
> #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
> #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> 
> Dave
> 
>> Thanks
>> Wen Congyang
>>
>>>
>>> (qemu) info block
>>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
>>>     Cache mode:       writeback, direct
>>>
>>> Dave
>>>
>>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>>>> Block replication is a very important feature which is used for
>>>> continuous checkpoints(for example: COLO).
>>>>
>>>> You can get the detailed information about block replication from here:
>>>> http://wiki.qemu.org/Features/BlockReplication
>>>>
>>>> Usage:
>>>> Please refer to docs/block-replication.txt
>>>>
>>>> This patch series is based on the following patch series:
>>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>>>>
>>>> You can get the patch here:
>>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
>>>>
>>>> You can get the patch with framework here:
>>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
>>>>
>>>> TODO:
>>>> 1. Continuous block replication. It will be started after basic functions
>>>>    are accepted.
>>>>
>>>> Changs Log:
>>>> V13:
>>>> 1. Rebase to the newest codes
>>>> 2. Remove redundant marcos and semicolon in replication.c 
>>>> 3. Fix typos in block-replication.txt
>>>> V12:
>>>> 1. Rebase to the newest codes
>>>> 2. Use backing reference to replcace 'allow-write-backing-file'
>>>> V11:
>>>> 1. Reopen the backing file when starting blcok replication if it is not
>>>>    opened in R/W mode
>>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>>>    when opening backing file
>>>> 3. Block the top BDS so there is only one block job for the top BDS and
>>>>    its backing chain.
>>>> V10:
>>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>>>    reference.
>>>> 2. Address the comments from Eric Blake
>>>> V9:
>>>> 1. Update the error messages
>>>> 2. Rebase to the newest qemu
>>>> 3. Split child add/delete support. These patches are sent in another patchset.
>>>> V8:
>>>> 1. Address Alberto Garcia's comments
>>>> V7:
>>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
>>>> V6:
>>>> 1. Rebase to the newest qemu.
>>>> V5:
>>>> 1. Address the comments from Gong Lei
>>>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>>>    if there are too many I/O requests.
>>>> V4:
>>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>>>> V3:
>>>> 1: use error_setg() instead of error_set()
>>>> 2. Add a new block job API
>>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>>>> 4. Add a testcase to test new hbitmap API
>>>> V2:
>>>> 1. Redesign the secondary qemu(use image-fleecing)
>>>> 2. Use Error objects to return error message
>>>> 3. Address the comments from Max Reitz and Eric Blake
>>>>
>>>> Wen Congyang (10):
>>>>   unblock backup operations in backing file
>>>>   Store parent BDS in BdrvChild
>>>>   Backup: clear all bitmap when doing block checkpoint
>>>>   Allow creating backup jobs when opening BDS
>>>>   docs: block replication's description
>>>>   Add new block driver interfaces to control block replication
>>>>   quorum: implement block driver interfaces for block replication
>>>>   Implement new driver for block replication
>>>>   support replication driver in blockdev-add
>>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
>>>>
>>>>  block.c                    | 145 ++++++++++++
>>>>  block/Makefile.objs        |   3 +-
>>>>  block/backup.c             |  14 ++
>>>>  block/quorum.c             |  78 +++++++
>>>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
>>>>  blockjob.c                 |  11 +
>>>>  docs/block-replication.txt | 227 +++++++++++++++++++
>>>>  include/block/block.h      |   9 +
>>>>  include/block/block_int.h  |  15 ++
>>>>  include/block/blockjob.h   |  12 +
>>>>  qapi/block-core.json       |  33 ++-
>>>>  11 files changed, 1089 insertions(+), 3 deletions(-)
>>>>  create mode 100644 block/replication.c
>>>>  create mode 100644 docs/block-replication.txt
>>>>
>>>> -- 
>>>> 1.9.3
>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>>
>>> .
>>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-29 10:27       ` Wen Congyang
@ 2016-01-29 10:47         ` Dr. David Alan Gilbert
  2016-02-01  1:18           ` Wen Congyang
  0 siblings, 1 reply; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-01-29 10:47 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Changlong Xie, Fam Zheng, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> >>> Hi,
> >>>   I've got a block error if I kill the secondary.
> >>>
> >>> Start both primary & secondary
> >>> kill -9 secondary qemu
> >>> x_colo_lost_heartbeat on primary
> >>>
> >>> The guest sees a block error and the ext4 root switches to read-only.
> >>>
> >>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
> >>> backtrace below.
> >>> (This is based on colo-v2.4-periodic-mode of the framework
> >>> code with the block and network proxy merged in; so it could be my
> >>> merging but I don't think so ?)
> >>>
> >>>
> >>> (gdb) where
> >>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> >>>     at /root/colo/jan-2016/qemu/block/quorum.c:222
> >>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
> >>>     at /root/colo/jan-2016/qemu/block/quorum.c:315
> >>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
> >>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> >>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
> >>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> >>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
> >>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> >>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> >>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> >>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> >>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> >>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> >>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> >>>
> >>> (gdb) p s->num_children
> >>> $1 = 2
> >>> (gdb) p acb->success_count
> >>> $2 = 0
> >>> (gdb) p acb->is_read
> >>> $5 = false
> >>
> >> Sorry for the late reply.
> > 
> > No problem.
> > 
> >> What it the value of acb->count?
> > 
> > (gdb) p acb->count
> > $1 = 1
> 
> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
> the guest doesn't know this error.
> >> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
> >> I/O error event.
> > 
> > Is that safe?  If the secondary fails, do you always have time to issue the command to
> > remove the children.1  before the guest sees the error?
> 
> We will write to two children, and expect that writing to children.0 will success. If so,
> the guest doesn't know this error. You just get the I/O error event.

I think children.0 is the disk, and that should be OK - so only the children.1/replication should
be failing - so in that case why do I see the error?
The 'node0' in the backtrace above is the name of the replication, so it does look like the error
is coming from the replication.

> > Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
> > 
> > (qemu) x_block_change colo-disk0 -d children.1
> > (qemu) x_colo_lost_heartbeat 
> 
> Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.

 But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
what you meant?

> > 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
> > 
> > #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
> >     at /root/colo/jan-2016/qemu/block.c:4426
> > 
> > (gdb) p drv
> > $1 = (BlockDriver *) 0x5d2a
> > 
> >   it looks like the whole of bs is bogus.
> > 
> > #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>, 
> >     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
> > 
> > (gdb) p s->replication_index
> > $3 = 1
> > 
> > I guess quorum_del_child needs to stop replication before it removes the child?
> 
> Yes, but in the newest version, quorum doesn't know the block replication, and I think
> we shoud add an reference to the bs when starting block replication.

Do you have a new version ready to test?  I'm interested to try it (and also interested
to try the latest version of the colo-proxy)

Dave

> Thanks
> Wen Congyang
> 
> > (although it would have to be careful not to block on the dead nbd).
> > 
> > #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
> >     at /root/colo/jan-2016/qemu/block.c:4504
> > #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
> > #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
> > #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> > #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
> > #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> > #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
> >     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> > #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> > #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> > #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> > #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> > #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> > #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> > 
> > Dave
> > 
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>> (qemu) info block
> >>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
> >>>     Cache mode:       writeback, direct
> >>>
> >>> Dave
> >>>
> >>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
> >>>> Block replication is a very important feature which is used for
> >>>> continuous checkpoints(for example: COLO).
> >>>>
> >>>> You can get the detailed information about block replication from here:
> >>>> http://wiki.qemu.org/Features/BlockReplication
> >>>>
> >>>> Usage:
> >>>> Please refer to docs/block-replication.txt
> >>>>
> >>>> This patch series is based on the following patch series:
> >>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
> >>>>
> >>>> You can get the patch here:
> >>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
> >>>>
> >>>> You can get the patch with framework here:
> >>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
> >>>>
> >>>> TODO:
> >>>> 1. Continuous block replication. It will be started after basic functions
> >>>>    are accepted.
> >>>>
> >>>> Changs Log:
> >>>> V13:
> >>>> 1. Rebase to the newest codes
> >>>> 2. Remove redundant marcos and semicolon in replication.c 
> >>>> 3. Fix typos in block-replication.txt
> >>>> V12:
> >>>> 1. Rebase to the newest codes
> >>>> 2. Use backing reference to replcace 'allow-write-backing-file'
> >>>> V11:
> >>>> 1. Reopen the backing file when starting blcok replication if it is not
> >>>>    opened in R/W mode
> >>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
> >>>>    when opening backing file
> >>>> 3. Block the top BDS so there is only one block job for the top BDS and
> >>>>    its backing chain.
> >>>> V10:
> >>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
> >>>>    reference.
> >>>> 2. Address the comments from Eric Blake
> >>>> V9:
> >>>> 1. Update the error messages
> >>>> 2. Rebase to the newest qemu
> >>>> 3. Split child add/delete support. These patches are sent in another patchset.
> >>>> V8:
> >>>> 1. Address Alberto Garcia's comments
> >>>> V7:
> >>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
> >>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> >>>> V6:
> >>>> 1. Rebase to the newest qemu.
> >>>> V5:
> >>>> 1. Address the comments from Gong Lei
> >>>> 2. Speed the failover up. The secondary vm can take over very quickly even
> >>>>    if there are too many I/O requests.
> >>>> V4:
> >>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> >>>> V3:
> >>>> 1: use error_setg() instead of error_set()
> >>>> 2. Add a new block job API
> >>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
> >>>> 4. Add a testcase to test new hbitmap API
> >>>> V2:
> >>>> 1. Redesign the secondary qemu(use image-fleecing)
> >>>> 2. Use Error objects to return error message
> >>>> 3. Address the comments from Max Reitz and Eric Blake
> >>>>
> >>>> Wen Congyang (10):
> >>>>   unblock backup operations in backing file
> >>>>   Store parent BDS in BdrvChild
> >>>>   Backup: clear all bitmap when doing block checkpoint
> >>>>   Allow creating backup jobs when opening BDS
> >>>>   docs: block replication's description
> >>>>   Add new block driver interfaces to control block replication
> >>>>   quorum: implement block driver interfaces for block replication
> >>>>   Implement new driver for block replication
> >>>>   support replication driver in blockdev-add
> >>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
> >>>>
> >>>>  block.c                    | 145 ++++++++++++
> >>>>  block/Makefile.objs        |   3 +-
> >>>>  block/backup.c             |  14 ++
> >>>>  block/quorum.c             |  78 +++++++
> >>>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>  blockjob.c                 |  11 +
> >>>>  docs/block-replication.txt | 227 +++++++++++++++++++
> >>>>  include/block/block.h      |   9 +
> >>>>  include/block/block_int.h  |  15 ++
> >>>>  include/block/blockjob.h   |  12 +
> >>>>  qapi/block-core.json       |  33 ++-
> >>>>  11 files changed, 1089 insertions(+), 3 deletions(-)
> >>>>  create mode 100644 block/replication.c
> >>>>  create mode 100644 docs/block-replication.txt
> >>>>
> >>>> -- 
> >>>> 1.9.3
> >>>>
> >>>>
> >>>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-01-29 10:47         ` Dr. David Alan Gilbert
@ 2016-02-01  1:18           ` Wen Congyang
  2016-02-01 10:18             ` Dr. David Alan Gilbert
  2016-02-04  2:32             ` Changlong Xie
  0 siblings, 2 replies; 27+ messages in thread
From: Wen Congyang @ 2016-02-01  1:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Changlong Xie, Fam Zheng, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini

On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>>>> Hi,
>>>>>   I've got a block error if I kill the secondary.
>>>>>
>>>>> Start both primary & secondary
>>>>> kill -9 secondary qemu
>>>>> x_colo_lost_heartbeat on primary
>>>>>
>>>>> The guest sees a block error and the ext4 root switches to read-only.
>>>>>
>>>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>>>> backtrace below.
>>>>> (This is based on colo-v2.4-periodic-mode of the framework
>>>>> code with the block and network proxy merged in; so it could be my
>>>>> merging but I don't think so ?)
>>>>>
>>>>>
>>>>> (gdb) where
>>>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:222
>>>>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
>>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:315
>>>>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
>>>>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
>>>>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
>>>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>>>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>>>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>>>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>>>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>>>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>>>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>
>>>>> (gdb) p s->num_children
>>>>> $1 = 2
>>>>> (gdb) p acb->success_count
>>>>> $2 = 0
>>>>> (gdb) p acb->is_read
>>>>> $5 = false
>>>>
>>>> Sorry for the late reply.
>>>
>>> No problem.
>>>
>>>> What it the value of acb->count?
>>>
>>> (gdb) p acb->count
>>> $1 = 1
>>
>> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
>> the guest doesn't know this error.
>>>> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
>>>> I/O error event.
>>>
>>> Is that safe?  If the secondary fails, do you always have time to issue the command to
>>> remove the children.1  before the guest sees the error?
>>
>> We will write to two children, and expect that writing to children.0 will success. If so,
>> the guest doesn't know this error. You just get the I/O error event.
> 
> I think children.0 is the disk, and that should be OK - so only the children.1/replication should
> be failing - so in that case why do I see the error?

I don't know, and I will check the codes.

> The 'node0' in the backtrace above is the name of the replication, so it does look like the error
> is coming from the replication.

No, the backtrace is just report an I/O error events to the management application.

> 
>>> Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
>>>
>>> (qemu) x_block_change colo-disk0 -d children.1
>>> (qemu) x_colo_lost_heartbeat 
>>
>> Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.
> 
>  But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
> what you meant?

Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 'x_block_change ... -d ...'.

> 
>>> 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
>>>
>>> #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
>>>     at /root/colo/jan-2016/qemu/block.c:4426
>>>
>>> (gdb) p drv
>>> $1 = (BlockDriver *) 0x5d2a
>>>
>>>   it looks like the whole of bs is bogus.
>>>
>>> #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>, 
>>>     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
>>>
>>> (gdb) p s->replication_index
>>> $3 = 1
>>>
>>> I guess quorum_del_child needs to stop replication before it removes the child?
>>
>> Yes, but in the newest version, quorum doesn't know the block replication, and I think
>> we shoud add an reference to the bs when starting block replication.
> 
> Do you have a new version ready to test?  I'm interested to try it (and also interested
> to try the latest version of the colo-proxy)

I think we can post the newest version this week.

Thanks
Wen Congyang

> 
> Dave
> 
>> Thanks
>> Wen Congyang
>>
>>> (although it would have to be careful not to block on the dead nbd).
>>>
>>> #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
>>>     at /root/colo/jan-2016/qemu/block.c:4504
>>> #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
>>> #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
>>> #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>> #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
>>> #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>> #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>> #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>> #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>> #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>> #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>> #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>> #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>
>>> Dave
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>> (qemu) info block
>>>>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
>>>>>     Cache mode:       writeback, direct
>>>>>
>>>>> Dave
>>>>>
>>>>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>>>>>> Block replication is a very important feature which is used for
>>>>>> continuous checkpoints(for example: COLO).
>>>>>>
>>>>>> You can get the detailed information about block replication from here:
>>>>>> http://wiki.qemu.org/Features/BlockReplication
>>>>>>
>>>>>> Usage:
>>>>>> Please refer to docs/block-replication.txt
>>>>>>
>>>>>> This patch series is based on the following patch series:
>>>>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>>>>>>
>>>>>> You can get the patch here:
>>>>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
>>>>>>
>>>>>> You can get the patch with framework here:
>>>>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
>>>>>>
>>>>>> TODO:
>>>>>> 1. Continuous block replication. It will be started after basic functions
>>>>>>    are accepted.
>>>>>>
>>>>>> Changs Log:
>>>>>> V13:
>>>>>> 1. Rebase to the newest codes
>>>>>> 2. Remove redundant marcos and semicolon in replication.c 
>>>>>> 3. Fix typos in block-replication.txt
>>>>>> V12:
>>>>>> 1. Rebase to the newest codes
>>>>>> 2. Use backing reference to replcace 'allow-write-backing-file'
>>>>>> V11:
>>>>>> 1. Reopen the backing file when starting blcok replication if it is not
>>>>>>    opened in R/W mode
>>>>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>>>>>    when opening backing file
>>>>>> 3. Block the top BDS so there is only one block job for the top BDS and
>>>>>>    its backing chain.
>>>>>> V10:
>>>>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>>>>>    reference.
>>>>>> 2. Address the comments from Eric Blake
>>>>>> V9:
>>>>>> 1. Update the error messages
>>>>>> 2. Rebase to the newest qemu
>>>>>> 3. Split child add/delete support. These patches are sent in another patchset.
>>>>>> V8:
>>>>>> 1. Address Alberto Garcia's comments
>>>>>> V7:
>>>>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>>>>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
>>>>>> V6:
>>>>>> 1. Rebase to the newest qemu.
>>>>>> V5:
>>>>>> 1. Address the comments from Gong Lei
>>>>>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>>>>>    if there are too many I/O requests.
>>>>>> V4:
>>>>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>>>>>> V3:
>>>>>> 1: use error_setg() instead of error_set()
>>>>>> 2. Add a new block job API
>>>>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>>>>>> 4. Add a testcase to test new hbitmap API
>>>>>> V2:
>>>>>> 1. Redesign the secondary qemu(use image-fleecing)
>>>>>> 2. Use Error objects to return error message
>>>>>> 3. Address the comments from Max Reitz and Eric Blake
>>>>>>
>>>>>> Wen Congyang (10):
>>>>>>   unblock backup operations in backing file
>>>>>>   Store parent BDS in BdrvChild
>>>>>>   Backup: clear all bitmap when doing block checkpoint
>>>>>>   Allow creating backup jobs when opening BDS
>>>>>>   docs: block replication's description
>>>>>>   Add new block driver interfaces to control block replication
>>>>>>   quorum: implement block driver interfaces for block replication
>>>>>>   Implement new driver for block replication
>>>>>>   support replication driver in blockdev-add
>>>>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
>>>>>>
>>>>>>  block.c                    | 145 ++++++++++++
>>>>>>  block/Makefile.objs        |   3 +-
>>>>>>  block/backup.c             |  14 ++
>>>>>>  block/quorum.c             |  78 +++++++
>>>>>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  blockjob.c                 |  11 +
>>>>>>  docs/block-replication.txt | 227 +++++++++++++++++++
>>>>>>  include/block/block.h      |   9 +
>>>>>>  include/block/block_int.h  |  15 ++
>>>>>>  include/block/blockjob.h   |  12 +
>>>>>>  qapi/block-core.json       |  33 ++-
>>>>>>  11 files changed, 1089 insertions(+), 3 deletions(-)
>>>>>>  create mode 100644 block/replication.c
>>>>>>  create mode 100644 docs/block-replication.txt
>>>>>>
>>>>>> -- 
>>>>>> 1.9.3
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>>
>>> .
>>>
>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-02-01  1:18           ` Wen Congyang
@ 2016-02-01 10:18             ` Dr. David Alan Gilbert
  2016-02-04  2:32             ` Changlong Xie
  1 sibling, 0 replies; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-01 10:18 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Changlong Xie, Fam Zheng, zhanghailiang, qemu block,
	Jiang Yunhong, Dong Eddie, qemu devel, Michael R. Hines,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
> >>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >>>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> >>>>> Hi,
> >>>>>   I've got a block error if I kill the secondary.
> >>>>>
> >>>>> Start both primary & secondary
> >>>>> kill -9 secondary qemu
> >>>>> x_colo_lost_heartbeat on primary
> >>>>>
> >>>>> The guest sees a block error and the ext4 root switches to read-only.
> >>>>>
> >>>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
> >>>>> backtrace below.
> >>>>> (This is based on colo-v2.4-periodic-mode of the framework
> >>>>> code with the block and network proxy merged in; so it could be my
> >>>>> merging but I don't think so ?)
> >>>>>
> >>>>>
> >>>>> (gdb) where
> >>>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> >>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:222
> >>>>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
> >>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:315
> >>>>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
> >>>>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> >>>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
> >>>>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> >>>>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
> >>>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> >>>>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> >>>>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> >>>>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> >>>>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> >>>>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> >>>>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> >>>>>
> >>>>> (gdb) p s->num_children
> >>>>> $1 = 2
> >>>>> (gdb) p acb->success_count
> >>>>> $2 = 0
> >>>>> (gdb) p acb->is_read
> >>>>> $5 = false
> >>>>
> >>>> Sorry for the late reply.
> >>>
> >>> No problem.
> >>>
> >>>> What it the value of acb->count?
> >>>
> >>> (gdb) p acb->count
> >>> $1 = 1
> >>
> >> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
> >> the guest doesn't know this error.
> >>>> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
> >>>> I/O error event.
> >>>
> >>> Is that safe?  If the secondary fails, do you always have time to issue the command to
> >>> remove the children.1  before the guest sees the error?
> >>
> >> We will write to two children, and expect that writing to children.0 will success. If so,
> >> the guest doesn't know this error. You just get the I/O error event.
> > 
> > I think children.0 is the disk, and that should be OK - so only the children.1/replication should
> > be failing - so in that case why do I see the error?
> 
> I don't know, and I will check the codes.
> 
> > The 'node0' in the backtrace above is the name of the replication, so it does look like the error
> > is coming from the replication.
> 
> No, the backtrace is just report an I/O error events to the management application.

OK, but the guest did see the error as well, so I'd assumed it was that path.

> >>> Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
> >>>
> >>> (qemu) x_block_change colo-disk0 -d children.1
> >>> (qemu) x_colo_lost_heartbeat 
> >>
> >> Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.
> > 
> >  But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
> > what you meant?
> 
> Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 'x_block_change ... -d ...'.

OK, but then that's quite separate from the problem with teh guest seeing it.

> >>> 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
> >>>
> >>> #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
> >>>     at /root/colo/jan-2016/qemu/block.c:4426
> >>>
> >>> (gdb) p drv
> >>> $1 = (BlockDriver *) 0x5d2a
> >>>
> >>>   it looks like the whole of bs is bogus.
> >>>
> >>> #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>, 
> >>>     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
> >>>
> >>> (gdb) p s->replication_index
> >>> $3 = 1
> >>>
> >>> I guess quorum_del_child needs to stop replication before it removes the child?
> >>
> >> Yes, but in the newest version, quorum doesn't know the block replication, and I think
> >> we shoud add an reference to the bs when starting block replication.
> > 
> > Do you have a new version ready to test?  I'm interested to try it (and also interested
> > to try the latest version of the colo-proxy)
> 
> I think we can post the newest version this week.

Thanks, that would be great.

Dave

> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> >> Thanks
> >> Wen Congyang
> >>
> >>> (although it would have to be careful not to block on the dead nbd).
> >>>
> >>> #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
> >>>     at /root/colo/jan-2016/qemu/block.c:4504
> >>> #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
> >>> #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
> >>> #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> >>> #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
> >>> #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> >>> #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, 
> >>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> >>> #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> >>> #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> >>> #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> >>> #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> >>> #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> >>> #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> >>>
> >>> Dave
> >>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>> (qemu) info block
> >>>>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
> >>>>>     Cache mode:       writeback, direct
> >>>>>
> >>>>> Dave
> >>>>>
> >>>>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
> >>>>>> Block replication is a very important feature which is used for
> >>>>>> continuous checkpoints(for example: COLO).
> >>>>>>
> >>>>>> You can get the detailed information about block replication from here:
> >>>>>> http://wiki.qemu.org/Features/BlockReplication
> >>>>>>
> >>>>>> Usage:
> >>>>>> Please refer to docs/block-replication.txt
> >>>>>>
> >>>>>> This patch series is based on the following patch series:
> >>>>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
> >>>>>>
> >>>>>> You can get the patch here:
> >>>>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
> >>>>>>
> >>>>>> You can get the patch with framework here:
> >>>>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
> >>>>>>
> >>>>>> TODO:
> >>>>>> 1. Continuous block replication. It will be started after basic functions
> >>>>>>    are accepted.
> >>>>>>
> >>>>>> Changs Log:
> >>>>>> V13:
> >>>>>> 1. Rebase to the newest codes
> >>>>>> 2. Remove redundant marcos and semicolon in replication.c 
> >>>>>> 3. Fix typos in block-replication.txt
> >>>>>> V12:
> >>>>>> 1. Rebase to the newest codes
> >>>>>> 2. Use backing reference to replcace 'allow-write-backing-file'
> >>>>>> V11:
> >>>>>> 1. Reopen the backing file when starting blcok replication if it is not
> >>>>>>    opened in R/W mode
> >>>>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
> >>>>>>    when opening backing file
> >>>>>> 3. Block the top BDS so there is only one block job for the top BDS and
> >>>>>>    its backing chain.
> >>>>>> V10:
> >>>>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
> >>>>>>    reference.
> >>>>>> 2. Address the comments from Eric Blake
> >>>>>> V9:
> >>>>>> 1. Update the error messages
> >>>>>> 2. Rebase to the newest qemu
> >>>>>> 3. Split child add/delete support. These patches are sent in another patchset.
> >>>>>> V8:
> >>>>>> 1. Address Alberto Garcia's comments
> >>>>>> V7:
> >>>>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
> >>>>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> >>>>>> V6:
> >>>>>> 1. Rebase to the newest qemu.
> >>>>>> V5:
> >>>>>> 1. Address the comments from Gong Lei
> >>>>>> 2. Speed the failover up. The secondary vm can take over very quickly even
> >>>>>>    if there are too many I/O requests.
> >>>>>> V4:
> >>>>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
> >>>>>> V3:
> >>>>>> 1: use error_setg() instead of error_set()
> >>>>>> 2. Add a new block job API
> >>>>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
> >>>>>> 4. Add a testcase to test new hbitmap API
> >>>>>> V2:
> >>>>>> 1. Redesign the secondary qemu(use image-fleecing)
> >>>>>> 2. Use Error objects to return error message
> >>>>>> 3. Address the comments from Max Reitz and Eric Blake
> >>>>>>
> >>>>>> Wen Congyang (10):
> >>>>>>   unblock backup operations in backing file
> >>>>>>   Store parent BDS in BdrvChild
> >>>>>>   Backup: clear all bitmap when doing block checkpoint
> >>>>>>   Allow creating backup jobs when opening BDS
> >>>>>>   docs: block replication's description
> >>>>>>   Add new block driver interfaces to control block replication
> >>>>>>   quorum: implement block driver interfaces for block replication
> >>>>>>   Implement new driver for block replication
> >>>>>>   support replication driver in blockdev-add
> >>>>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
> >>>>>>
> >>>>>>  block.c                    | 145 ++++++++++++
> >>>>>>  block/Makefile.objs        |   3 +-
> >>>>>>  block/backup.c             |  14 ++
> >>>>>>  block/quorum.c             |  78 +++++++
> >>>>>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>  blockjob.c                 |  11 +
> >>>>>>  docs/block-replication.txt | 227 +++++++++++++++++++
> >>>>>>  include/block/block.h      |   9 +
> >>>>>>  include/block/block_int.h  |  15 ++
> >>>>>>  include/block/blockjob.h   |  12 +
> >>>>>>  qapi/block-core.json       |  33 ++-
> >>>>>>  11 files changed, 1089 insertions(+), 3 deletions(-)
> >>>>>>  create mode 100644 block/replication.c
> >>>>>>  create mode 100644 docs/block-replication.txt
> >>>>>>
> >>>>>> -- 
> >>>>>> 1.9.3
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>>>
> >>>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > 
> > .
> > 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-02-01  1:18           ` Wen Congyang
  2016-02-01 10:18             ` Dr. David Alan Gilbert
@ 2016-02-04  2:32             ` Changlong Xie
  2016-02-04  9:07               ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 27+ messages in thread
From: Changlong Xie @ 2016-02-04  2:32 UTC (permalink / raw)
  To: Wen Congyang, Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini

On 02/01/2016 09:18 AM, Wen Congyang wrote:
> On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>>>>> Hi,
>>>>>>    I've got a block error if I kill the secondary.
>>>>>>
>>>>>> Start both primary & secondary
>>>>>> kill -9 secondary qemu
>>>>>> x_colo_lost_heartbeat on primary
>>>>>>
>>>>>> The guest sees a block error and the ext4 root switches to read-only.
>>>>>>
>>>>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>>>>> backtrace below.
>>>>>> (This is based on colo-v2.4-periodic-mode of the framework
>>>>>> code with the block and network proxy merged in; so it could be my
>>>>>> merging but I don't think so ?)
>>>>>>
>>>>>>
>>>>>> (gdb) where
>>>>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>>>>>      at /root/colo/jan-2016/qemu/block/quorum.c:222
>>>>>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
>>>>>>      at /root/colo/jan-2016/qemu/block/quorum.c:315
>>>>>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
>>>>>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>>>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
>>>>>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
>>>>>>      user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>>>>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>>>>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>>>>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>>>>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>>>>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>>>>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>>
>>>>>> (gdb) p s->num_children
>>>>>> $1 = 2
>>>>>> (gdb) p acb->success_count
>>>>>> $2 = 0
>>>>>> (gdb) p acb->is_read
>>>>>> $5 = false
>>>>>
>>>>> Sorry for the late reply.
>>>>
>>>> No problem.
>>>>
>>>>> What it the value of acb->count?
>>>>
>>>> (gdb) p acb->count
>>>> $1 = 1
>>>
>>> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
>>> the guest doesn't know this error.
>>>>> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
>>>>> I/O error event.
>>>>
>>>> Is that safe?  If the secondary fails, do you always have time to issue the command to
>>>> remove the children.1  before the guest sees the error?
>>>
>>> We will write to two children, and expect that writing to children.0 will success. If so,
>>> the guest doesn't know this error. You just get the I/O error event.
>>
>> I think children.0 is the disk, and that should be OK - so only the children.1/replication should
>> be failing - so in that case why do I see the error?
>
> I don't know, and I will check the codes.
>
>> The 'node0' in the backtrace above is the name of the replication, so it does look like the error
>> is coming from the replication.
>
> No, the backtrace is just report an I/O error events to the management application.
>
>>
>>>> Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
>>>>
>>>> (qemu) x_block_change colo-disk0 -d children.1
>>>> (qemu) x_colo_lost_heartbeat
>>>
>>> Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.
>>
>>   But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
>> what you meant?
>
> Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 'x_block_change ... -d ...'.
>
Hi david
	
It seems we missed 'drive_del' command, and will document it in next 
version. Here is the right commands order:

{ "execute": "x-colo-lost-heartbeat" }
{ 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk', 
'child': 'children.1'}}
{ 'execute': 'human-monitor-command', 'arguments': {'command-line': 
'drive_del xxxxx'}}

Thanks
	-Xie
>>
>>>> 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
>>>>
>>>> #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
>>>>      at /root/colo/jan-2016/qemu/block.c:4426
>>>>
>>>> (gdb) p drv
>>>> $1 = (BlockDriver *) 0x5d2a
>>>>
>>>>    it looks like the whole of bs is bogus.
>>>>
>>>> #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>,
>>>>      errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
>>>>
>>>> (gdb) p s->replication_index
>>>> $3 = 1
>>>>
>>>> I guess quorum_del_child needs to stop replication before it removes the child?
>>>
>>> Yes, but in the newest version, quorum doesn't know the block replication, and I think
>>> we shoud add an reference to the bs when starting block replication.
>>
>> Do you have a new version ready to test?  I'm interested to try it (and also interested
>> to try the latest version of the colo-proxy)
>
> I think we can post the newest version this week.
>
> Thanks
> Wen Congyang
>
>>
>> Dave
>>
>>> Thanks
>>> Wen Congyang
>>>
>>>> (although it would have to be careful not to block on the dead nbd).
>>>>
>>>> #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
>>>>      at /root/colo/jan-2016/qemu/block.c:4504
>>>> #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
>>>> #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
>>>> #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>>> #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
>>>> #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>>> #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
>>>>      user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>>> #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>>> #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>>> #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>>> #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>>> #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>>> #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>
>>>> Dave
>>>>
>>>>> Thanks
>>>>> Wen Congyang
>>>>>
>>>>>>
>>>>>> (qemu) info block
>>>>>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
>>>>>>      Cache mode:       writeback, direct
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>>>>>>> Block replication is a very important feature which is used for
>>>>>>> continuous checkpoints(for example: COLO).
>>>>>>>
>>>>>>> You can get the detailed information about block replication from here:
>>>>>>> http://wiki.qemu.org/Features/BlockReplication
>>>>>>>
>>>>>>> Usage:
>>>>>>> Please refer to docs/block-replication.txt
>>>>>>>
>>>>>>> This patch series is based on the following patch series:
>>>>>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>>>>>>>
>>>>>>> You can get the patch here:
>>>>>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
>>>>>>>
>>>>>>> You can get the patch with framework here:
>>>>>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
>>>>>>>
>>>>>>> TODO:
>>>>>>> 1. Continuous block replication. It will be started after basic functions
>>>>>>>     are accepted.
>>>>>>>
>>>>>>> Changs Log:
>>>>>>> V13:
>>>>>>> 1. Rebase to the newest codes
>>>>>>> 2. Remove redundant marcos and semicolon in replication.c
>>>>>>> 3. Fix typos in block-replication.txt
>>>>>>> V12:
>>>>>>> 1. Rebase to the newest codes
>>>>>>> 2. Use backing reference to replcace 'allow-write-backing-file'
>>>>>>> V11:
>>>>>>> 1. Reopen the backing file when starting blcok replication if it is not
>>>>>>>     opened in R/W mode
>>>>>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>>>>>>     when opening backing file
>>>>>>> 3. Block the top BDS so there is only one block job for the top BDS and
>>>>>>>     its backing chain.
>>>>>>> V10:
>>>>>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>>>>>>     reference.
>>>>>>> 2. Address the comments from Eric Blake
>>>>>>> V9:
>>>>>>> 1. Update the error messages
>>>>>>> 2. Rebase to the newest qemu
>>>>>>> 3. Split child add/delete support. These patches are sent in another patchset.
>>>>>>> V8:
>>>>>>> 1. Address Alberto Garcia's comments
>>>>>>> V7:
>>>>>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>>>>>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
>>>>>>> V6:
>>>>>>> 1. Rebase to the newest qemu.
>>>>>>> V5:
>>>>>>> 1. Address the comments from Gong Lei
>>>>>>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>>>>>>     if there are too many I/O requests.
>>>>>>> V4:
>>>>>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>>>>>>> V3:
>>>>>>> 1: use error_setg() instead of error_set()
>>>>>>> 2. Add a new block job API
>>>>>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>>>>>>> 4. Add a testcase to test new hbitmap API
>>>>>>> V2:
>>>>>>> 1. Redesign the secondary qemu(use image-fleecing)
>>>>>>> 2. Use Error objects to return error message
>>>>>>> 3. Address the comments from Max Reitz and Eric Blake
>>>>>>>
>>>>>>> Wen Congyang (10):
>>>>>>>    unblock backup operations in backing file
>>>>>>>    Store parent BDS in BdrvChild
>>>>>>>    Backup: clear all bitmap when doing block checkpoint
>>>>>>>    Allow creating backup jobs when opening BDS
>>>>>>>    docs: block replication's description
>>>>>>>    Add new block driver interfaces to control block replication
>>>>>>>    quorum: implement block driver interfaces for block replication
>>>>>>>    Implement new driver for block replication
>>>>>>>    support replication driver in blockdev-add
>>>>>>>    Add a new API to start/stop replication, do checkpoint to all BDSes
>>>>>>>
>>>>>>>   block.c                    | 145 ++++++++++++
>>>>>>>   block/Makefile.objs        |   3 +-
>>>>>>>   block/backup.c             |  14 ++
>>>>>>>   block/quorum.c             |  78 +++++++
>>>>>>>   block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>   blockjob.c                 |  11 +
>>>>>>>   docs/block-replication.txt | 227 +++++++++++++++++++
>>>>>>>   include/block/block.h      |   9 +
>>>>>>>   include/block/block_int.h  |  15 ++
>>>>>>>   include/block/blockjob.h   |  12 +
>>>>>>>   qapi/block-core.json       |  33 ++-
>>>>>>>   11 files changed, 1089 insertions(+), 3 deletions(-)
>>>>>>>   create mode 100644 block/replication.c
>>>>>>>   create mode 100644 docs/block-replication.txt
>>>>>>>
>>>>>>> --
>>>>>>> 1.9.3
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>>
>> .
>>
>
> .
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-02-04  2:32             ` Changlong Xie
@ 2016-02-04  9:07               ` Dr. David Alan Gilbert
  2016-02-04  9:16                 ` Wen Congyang
  2016-02-04 10:17                 ` Changlong Xie
  0 siblings, 2 replies; 27+ messages in thread
From: Dr. David Alan Gilbert @ 2016-02-04  9:07 UTC (permalink / raw)
  To: Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, zhanghailiang

* Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
> On 02/01/2016 09:18 AM, Wen Congyang wrote:
> >On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
> >>* Wen Congyang (wency@cn.fujitsu.com) wrote:
> >>>On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
> >>>>* Wen Congyang (wency@cn.fujitsu.com) wrote:
> >>>>>On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> >>>>>>Hi,
> >>>>>>   I've got a block error if I kill the secondary.
> >>>>>>
> >>>>>>Start both primary & secondary
> >>>>>>kill -9 secondary qemu
> >>>>>>x_colo_lost_heartbeat on primary
> >>>>>>
> >>>>>>The guest sees a block error and the ext4 root switches to read-only.
> >>>>>>
> >>>>>>I gdb'd the primary with a breakpoint on quorum_report_bad; see
> >>>>>>backtrace below.
> >>>>>>(This is based on colo-v2.4-periodic-mode of the framework
> >>>>>>code with the block and network proxy merged in; so it could be my
> >>>>>>merging but I don't think so ?)
> >>>>>>
> >>>>>>
> >>>>>>(gdb) where
> >>>>>>#0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> >>>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:222
> >>>>>>#1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
> >>>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:315
> >>>>>>#2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
> >>>>>>#3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> >>>>>>#4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
> >>>>>>#5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> >>>>>>#6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
> >>>>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> >>>>>>#7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> >>>>>>#8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> >>>>>>#9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> >>>>>>#10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> >>>>>>#11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> >>>>>>#12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> >>>>>>
> >>>>>>(gdb) p s->num_children
> >>>>>>$1 = 2
> >>>>>>(gdb) p acb->success_count
> >>>>>>$2 = 0
> >>>>>>(gdb) p acb->is_read
> >>>>>>$5 = false
> >>>>>
> >>>>>Sorry for the late reply.
> >>>>
> >>>>No problem.
> >>>>
> >>>>>What it the value of acb->count?
> >>>>
> >>>>(gdb) p acb->count
> >>>>$1 = 1
> >>>
> >>>Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
> >>>the guest doesn't know this error.
> >>>>>If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
> >>>>>I/O error event.
> >>>>
> >>>>Is that safe?  If the secondary fails, do you always have time to issue the command to
> >>>>remove the children.1  before the guest sees the error?
> >>>
> >>>We will write to two children, and expect that writing to children.0 will success. If so,
> >>>the guest doesn't know this error. You just get the I/O error event.
> >>
> >>I think children.0 is the disk, and that should be OK - so only the children.1/replication should
> >>be failing - so in that case why do I see the error?
> >
> >I don't know, and I will check the codes.
> >
> >>The 'node0' in the backtrace above is the name of the replication, so it does look like the error
> >>is coming from the replication.
> >
> >No, the backtrace is just report an I/O error events to the management application.
> >
> >>
> >>>>Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
> >>>>
> >>>>(qemu) x_block_change colo-disk0 -d children.1
> >>>>(qemu) x_colo_lost_heartbeat
> >>>
> >>>Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.
> >>
> >>  But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
> >>what you meant?
> >
> >Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 'x_block_change ... -d ...'.
> >
> Hi david

Hi Xie,
  Thanks for the response.

> It seems we missed 'drive_del' command, and will document it in next
> version. Here is the right commands order:
> 
> { "execute": "x-colo-lost-heartbeat" }
> { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk',
> 'child': 'children.1'}}
> { 'execute': 'human-monitor-command', 'arguments': {'command-line':
> 'drive_del xxxxx'}}

OK,  however, you should fix the seg fault if you don't issue the drive_del;
qemu should never crash.
(Also I still get the IO error in the guest if I do the x-colo-lost-heartbeat).

Dave

> Thanks
> 	-Xie
> >>
> >>>>12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
> >>>>
> >>>>#0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
> >>>>     at /root/colo/jan-2016/qemu/block.c:4426
> >>>>
> >>>>(gdb) p drv
> >>>>$1 = (BlockDriver *) 0x5d2a
> >>>>
> >>>>   it looks like the whole of bs is bogus.
> >>>>
> >>>>#1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>,
> >>>>     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
> >>>>
> >>>>(gdb) p s->replication_index
> >>>>$3 = 1
> >>>>
> >>>>I guess quorum_del_child needs to stop replication before it removes the child?
> >>>
> >>>Yes, but in the newest version, quorum doesn't know the block replication, and I think
> >>>we shoud add an reference to the bs when starting block replication.
> >>
> >>Do you have a new version ready to test?  I'm interested to try it (and also interested
> >>to try the latest version of the colo-proxy)
> >
> >I think we can post the newest version this week.
> >
> >Thanks
> >Wen Congyang
> >
> >>
> >>Dave
> >>
> >>>Thanks
> >>>Wen Congyang
> >>>
> >>>>(although it would have to be careful not to block on the dead nbd).
> >>>>
> >>>>#2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
> >>>>     at /root/colo/jan-2016/qemu/block.c:4504
> >>>>#3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
> >>>>#4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
> >>>>#5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
> >>>>#6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
> >>>>#7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
> >>>>#8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
> >>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
> >>>>#9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> >>>>#10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
> >>>>#11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
> >>>>#12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
> >>>>#13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
> >>>>#14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
> >>>>
> >>>>Dave
> >>>>
> >>>>>Thanks
> >>>>>Wen Congyang
> >>>>>
> >>>>>>
> >>>>>>(qemu) info block
> >>>>>>colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
> >>>>>>     Cache mode:       writeback, direct
> >>>>>>
> >>>>>>Dave
> >>>>>>
> >>>>>>* Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
> >>>>>>>Block replication is a very important feature which is used for
> >>>>>>>continuous checkpoints(for example: COLO).
> >>>>>>>
> >>>>>>>You can get the detailed information about block replication from here:
> >>>>>>>http://wiki.qemu.org/Features/BlockReplication
> >>>>>>>
> >>>>>>>Usage:
> >>>>>>>Please refer to docs/block-replication.txt
> >>>>>>>
> >>>>>>>This patch series is based on the following patch series:
> >>>>>>>1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
> >>>>>>>
> >>>>>>>You can get the patch here:
> >>>>>>>https://github.com/Pating/qemu/tree/changlox/block-replication-v13
> >>>>>>>
> >>>>>>>You can get the patch with framework here:
> >>>>>>>https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
> >>>>>>>
> >>>>>>>TODO:
> >>>>>>>1. Continuous block replication. It will be started after basic functions
> >>>>>>>    are accepted.
> >>>>>>>
> >>>>>>>Changs Log:
> >>>>>>>V13:
> >>>>>>>1. Rebase to the newest codes
> >>>>>>>2. Remove redundant marcos and semicolon in replication.c
> >>>>>>>3. Fix typos in block-replication.txt
> >>>>>>>V12:
> >>>>>>>1. Rebase to the newest codes
> >>>>>>>2. Use backing reference to replcace 'allow-write-backing-file'
> >>>>>>>V11:
> >>>>>>>1. Reopen the backing file when starting blcok replication if it is not
> >>>>>>>    opened in R/W mode
> >>>>>>>2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
> >>>>>>>    when opening backing file
> >>>>>>>3. Block the top BDS so there is only one block job for the top BDS and
> >>>>>>>    its backing chain.
> >>>>>>>V10:
> >>>>>>>1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
> >>>>>>>    reference.
> >>>>>>>2. Address the comments from Eric Blake
> >>>>>>>V9:
> >>>>>>>1. Update the error messages
> >>>>>>>2. Rebase to the newest qemu
> >>>>>>>3. Split child add/delete support. These patches are sent in another patchset.
> >>>>>>>V8:
> >>>>>>>1. Address Alberto Garcia's comments
> >>>>>>>V7:
> >>>>>>>1. Implement adding/removing quorum child. Remove the option non-connect.
> >>>>>>>2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
> >>>>>>>V6:
> >>>>>>>1. Rebase to the newest qemu.
> >>>>>>>V5:
> >>>>>>>1. Address the comments from Gong Lei
> >>>>>>>2. Speed the failover up. The secondary vm can take over very quickly even
> >>>>>>>    if there are too many I/O requests.
> >>>>>>>V4:
> >>>>>>>1. Introduce a new driver replication to avoid touch nbd and qcow2.
> >>>>>>>V3:
> >>>>>>>1: use error_setg() instead of error_set()
> >>>>>>>2. Add a new block job API
> >>>>>>>3. Active disk, hidden disk and nbd target uses the same AioContext
> >>>>>>>4. Add a testcase to test new hbitmap API
> >>>>>>>V2:
> >>>>>>>1. Redesign the secondary qemu(use image-fleecing)
> >>>>>>>2. Use Error objects to return error message
> >>>>>>>3. Address the comments from Max Reitz and Eric Blake
> >>>>>>>
> >>>>>>>Wen Congyang (10):
> >>>>>>>   unblock backup operations in backing file
> >>>>>>>   Store parent BDS in BdrvChild
> >>>>>>>   Backup: clear all bitmap when doing block checkpoint
> >>>>>>>   Allow creating backup jobs when opening BDS
> >>>>>>>   docs: block replication's description
> >>>>>>>   Add new block driver interfaces to control block replication
> >>>>>>>   quorum: implement block driver interfaces for block replication
> >>>>>>>   Implement new driver for block replication
> >>>>>>>   support replication driver in blockdev-add
> >>>>>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
> >>>>>>>
> >>>>>>>  block.c                    | 145 ++++++++++++
> >>>>>>>  block/Makefile.objs        |   3 +-
> >>>>>>>  block/backup.c             |  14 ++
> >>>>>>>  block/quorum.c             |  78 +++++++
> >>>>>>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>  blockjob.c                 |  11 +
> >>>>>>>  docs/block-replication.txt | 227 +++++++++++++++++++
> >>>>>>>  include/block/block.h      |   9 +
> >>>>>>>  include/block/block_int.h  |  15 ++
> >>>>>>>  include/block/blockjob.h   |  12 +
> >>>>>>>  qapi/block-core.json       |  33 ++-
> >>>>>>>  11 files changed, 1089 insertions(+), 3 deletions(-)
> >>>>>>>  create mode 100644 block/replication.c
> >>>>>>>  create mode 100644 docs/block-replication.txt
> >>>>>>>
> >>>>>>>--
> >>>>>>>1.9.3
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>--
> >>>>>>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>>>>
> >>>>>>
> >>>>>>.
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>--
> >>>>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>>
> >>>>
> >>>>.
> >>>>
> >>>
> >>>
> >>>
> >>--
> >>Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>
> >>
> >>.
> >>
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-02-04  9:07               ` Dr. David Alan Gilbert
@ 2016-02-04  9:16                 ` Wen Congyang
  2016-02-04 10:17                 ` Changlong Xie
  1 sibling, 0 replies; 27+ messages in thread
From: Wen Congyang @ 2016-02-04  9:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Changlong Xie
  Cc: Kevin Wolf, Fam Zheng, zhanghailiang, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini

On 02/04/2016 05:07 PM, Dr. David Alan Gilbert wrote:
> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>> On 02/01/2016 09:18 AM, Wen Congyang wrote:
>>> On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>>>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>> Hi,
>>>>>>>>   I've got a block error if I kill the secondary.
>>>>>>>>
>>>>>>>> Start both primary & secondary
>>>>>>>> kill -9 secondary qemu
>>>>>>>> x_colo_lost_heartbeat on primary
>>>>>>>>
>>>>>>>> The guest sees a block error and the ext4 root switches to read-only.
>>>>>>>>
>>>>>>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>>>>>>> backtrace below.
>>>>>>>> (This is based on colo-v2.4-periodic-mode of the framework
>>>>>>>> code with the block and network proxy merged in; so it could be my
>>>>>>>> merging but I don't think so ?)
>>>>>>>>
>>>>>>>>
>>>>>>>> (gdb) where
>>>>>>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>>>>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:222
>>>>>>>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
>>>>>>>>     at /root/colo/jan-2016/qemu/block/quorum.c:315
>>>>>>>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
>>>>>>>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>>>>>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
>>>>>>>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>>>>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
>>>>>>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>>>>>>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>>>>>>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>>>>>>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>>>>>>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>>>>>>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>>>>>>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>>>>
>>>>>>>> (gdb) p s->num_children
>>>>>>>> $1 = 2
>>>>>>>> (gdb) p acb->success_count
>>>>>>>> $2 = 0
>>>>>>>> (gdb) p acb->is_read
>>>>>>>> $5 = false
>>>>>>>
>>>>>>> Sorry for the late reply.
>>>>>>
>>>>>> No problem.
>>>>>>
>>>>>>> What it the value of acb->count?
>>>>>>
>>>>>> (gdb) p acb->count
>>>>>> $1 = 1
>>>>>
>>>>> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
>>>>> the guest doesn't know this error.
>>>>>>> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
>>>>>>> I/O error event.
>>>>>>
>>>>>> Is that safe?  If the secondary fails, do you always have time to issue the command to
>>>>>> remove the children.1  before the guest sees the error?
>>>>>
>>>>> We will write to two children, and expect that writing to children.0 will success. If so,
>>>>> the guest doesn't know this error. You just get the I/O error event.
>>>>
>>>> I think children.0 is the disk, and that should be OK - so only the children.1/replication should
>>>> be failing - so in that case why do I see the error?
>>>
>>> I don't know, and I will check the codes.
>>>
>>>> The 'node0' in the backtrace above is the name of the replication, so it does look like the error
>>>> is coming from the replication.
>>>
>>> No, the backtrace is just report an I/O error events to the management application.
>>>
>>>>
>>>>>> Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
>>>>>>
>>>>>> (qemu) x_block_change colo-disk0 -d children.1
>>>>>> (qemu) x_colo_lost_heartbeat
>>>>>
>>>>> Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.
>>>>
>>>>  But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
>>>> what you meant?
>>>
>>> Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 'x_block_change ... -d ...'.
>>>
>> Hi david
> 
> Hi Xie,
>   Thanks for the response.
> 
>> It seems we missed 'drive_del' command, and will document it in next
>> version. Here is the right commands order:
>>
>> { "execute": "x-colo-lost-heartbeat" }
>> { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk',
>> 'child': 'children.1'}}
>> { 'execute': 'human-monitor-command', 'arguments': {'command-line':
>> 'drive_del xxxxx'}}
> 
> OK,  however, you should fix the seg fault if you don't issue the drive_del;
> qemu should never crash.

We will post the newest version tommorow, please try the newset codes.

> (Also I still get the IO error in the guest if I do the x-colo-lost-heartbeat).

I think it is quorum's bug: quorum flush function returns this error. Children.0
returns 0, and children.1 return an error. In this case, I think we should return 0
to the caller.

Thanks
Wen Congyang

> 
> Dave
> 
>> Thanks
>> 	-Xie
>>>>
>>>>>> 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
>>>>>>
>>>>>> #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
>>>>>>     at /root/colo/jan-2016/qemu/block.c:4426
>>>>>>
>>>>>> (gdb) p drv
>>>>>> $1 = (BlockDriver *) 0x5d2a
>>>>>>
>>>>>>   it looks like the whole of bs is bogus.
>>>>>>
>>>>>> #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>,
>>>>>>     errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
>>>>>>
>>>>>> (gdb) p s->replication_index
>>>>>> $3 = 1
>>>>>>
>>>>>> I guess quorum_del_child needs to stop replication before it removes the child?
>>>>>
>>>>> Yes, but in the newest version, quorum doesn't know the block replication, and I think
>>>>> we shoud add an reference to the bs when starting block replication.
>>>>
>>>> Do you have a new version ready to test?  I'm interested to try it (and also interested
>>>> to try the latest version of the colo-proxy)
>>>
>>> I think we can post the newest version this week.
>>>
>>> Thanks
>>> Wen Congyang
>>>
>>>>
>>>> Dave
>>>>
>>>>> Thanks
>>>>> Wen Congyang
>>>>>
>>>>>> (although it would have to be careful not to block on the dead nbd).
>>>>>>
>>>>>> #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
>>>>>>     at /root/colo/jan-2016/qemu/block.c:4504
>>>>>> #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
>>>>>> #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
>>>>>> #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>>>>> #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
>>>>>> #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>>> #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
>>>>>>     user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>>>>> #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>>>>> #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>>>>> #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>>>>> #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>>>>> #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>>>>> #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>>> Thanks
>>>>>>> Wen Congyang
>>>>>>>
>>>>>>>>
>>>>>>>> (qemu) info block
>>>>>>>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
>>>>>>>>     Cache mode:       writeback, direct
>>>>>>>>
>>>>>>>> Dave
>>>>>>>>
>>>>>>>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>>>>>>>>> Block replication is a very important feature which is used for
>>>>>>>>> continuous checkpoints(for example: COLO).
>>>>>>>>>
>>>>>>>>> You can get the detailed information about block replication from here:
>>>>>>>>> http://wiki.qemu.org/Features/BlockReplication
>>>>>>>>>
>>>>>>>>> Usage:
>>>>>>>>> Please refer to docs/block-replication.txt
>>>>>>>>>
>>>>>>>>> This patch series is based on the following patch series:
>>>>>>>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>>>>>>>>>
>>>>>>>>> You can get the patch here:
>>>>>>>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
>>>>>>>>>
>>>>>>>>> You can get the patch with framework here:
>>>>>>>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
>>>>>>>>>
>>>>>>>>> TODO:
>>>>>>>>> 1. Continuous block replication. It will be started after basic functions
>>>>>>>>>    are accepted.
>>>>>>>>>
>>>>>>>>> Changs Log:
>>>>>>>>> V13:
>>>>>>>>> 1. Rebase to the newest codes
>>>>>>>>> 2. Remove redundant marcos and semicolon in replication.c
>>>>>>>>> 3. Fix typos in block-replication.txt
>>>>>>>>> V12:
>>>>>>>>> 1. Rebase to the newest codes
>>>>>>>>> 2. Use backing reference to replcace 'allow-write-backing-file'
>>>>>>>>> V11:
>>>>>>>>> 1. Reopen the backing file when starting blcok replication if it is not
>>>>>>>>>    opened in R/W mode
>>>>>>>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>>>>>>>>    when opening backing file
>>>>>>>>> 3. Block the top BDS so there is only one block job for the top BDS and
>>>>>>>>>    its backing chain.
>>>>>>>>> V10:
>>>>>>>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>>>>>>>>    reference.
>>>>>>>>> 2. Address the comments from Eric Blake
>>>>>>>>> V9:
>>>>>>>>> 1. Update the error messages
>>>>>>>>> 2. Rebase to the newest qemu
>>>>>>>>> 3. Split child add/delete support. These patches are sent in another patchset.
>>>>>>>>> V8:
>>>>>>>>> 1. Address Alberto Garcia's comments
>>>>>>>>> V7:
>>>>>>>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>>>>>>>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
>>>>>>>>> V6:
>>>>>>>>> 1. Rebase to the newest qemu.
>>>>>>>>> V5:
>>>>>>>>> 1. Address the comments from Gong Lei
>>>>>>>>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>>>>>>>>    if there are too many I/O requests.
>>>>>>>>> V4:
>>>>>>>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>>>>>>>>> V3:
>>>>>>>>> 1: use error_setg() instead of error_set()
>>>>>>>>> 2. Add a new block job API
>>>>>>>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>>>>>>>>> 4. Add a testcase to test new hbitmap API
>>>>>>>>> V2:
>>>>>>>>> 1. Redesign the secondary qemu(use image-fleecing)
>>>>>>>>> 2. Use Error objects to return error message
>>>>>>>>> 3. Address the comments from Max Reitz and Eric Blake
>>>>>>>>>
>>>>>>>>> Wen Congyang (10):
>>>>>>>>>   unblock backup operations in backing file
>>>>>>>>>   Store parent BDS in BdrvChild
>>>>>>>>>   Backup: clear all bitmap when doing block checkpoint
>>>>>>>>>   Allow creating backup jobs when opening BDS
>>>>>>>>>   docs: block replication's description
>>>>>>>>>   Add new block driver interfaces to control block replication
>>>>>>>>>   quorum: implement block driver interfaces for block replication
>>>>>>>>>   Implement new driver for block replication
>>>>>>>>>   support replication driver in blockdev-add
>>>>>>>>>   Add a new API to start/stop replication, do checkpoint to all BDSes
>>>>>>>>>
>>>>>>>>>  block.c                    | 145 ++++++++++++
>>>>>>>>>  block/Makefile.objs        |   3 +-
>>>>>>>>>  block/backup.c             |  14 ++
>>>>>>>>>  block/quorum.c             |  78 +++++++
>>>>>>>>>  block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>  blockjob.c                 |  11 +
>>>>>>>>>  docs/block-replication.txt | 227 +++++++++++++++++++
>>>>>>>>>  include/block/block.h      |   9 +
>>>>>>>>>  include/block/block_int.h  |  15 ++
>>>>>>>>>  include/block/blockjob.h   |  12 +
>>>>>>>>>  qapi/block-core.json       |  33 ++-
>>>>>>>>>  11 files changed, 1089 insertions(+), 3 deletions(-)
>>>>>>>>>  create mode 100644 block/replication.c
>>>>>>>>>  create mode 100644 docs/block-replication.txt
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 1.9.3
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>>>
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>
>>>>
>>>> .
>>>>
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints
  2016-02-04  9:07               ` Dr. David Alan Gilbert
  2016-02-04  9:16                 ` Wen Congyang
@ 2016-02-04 10:17                 ` Changlong Xie
  1 sibling, 0 replies; 27+ messages in thread
From: Changlong Xie @ 2016-02-04 10:17 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Michael R. Hines, Max Reitz, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, zhanghailiang

On 02/04/2016 05:07 PM, Dr. David Alan Gilbert wrote:
> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>> On 02/01/2016 09:18 AM, Wen Congyang wrote:
>>> On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
>>>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>>>> On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>>> Hi,
>>>>>>>>    I've got a block error if I kill the secondary.
>>>>>>>>
>>>>>>>> Start both primary & secondary
>>>>>>>> kill -9 secondary qemu
>>>>>>>> x_colo_lost_heartbeat on primary
>>>>>>>>
>>>>>>>> The guest sees a block error and the ext4 root switches to read-only.
>>>>>>>>
>>>>>>>> I gdb'd the primary with a breakpoint on quorum_report_bad; see
>>>>>>>> backtrace below.
>>>>>>>> (This is based on colo-v2.4-periodic-mode of the framework
>>>>>>>> code with the block and network proxy merged in; so it could be my
>>>>>>>> merging but I don't think so ?)
>>>>>>>>
>>>>>>>>
>>>>>>>> (gdb) where
>>>>>>>> #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, acb=0x7f2946cb3910, acb=0x7f2946cb3910)
>>>>>>>>      at /root/colo/jan-2016/qemu/block/quorum.c:222
>>>>>>>> #1  0x00007f2943b23058 in quorum_aio_cb (opaque=<optimized out>, ret=<optimized out>)
>>>>>>>>      at /root/colo/jan-2016/qemu/block/quorum.c:315
>>>>>>>> #2  0x00007f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at /root/colo/jan-2016/qemu/block/io.c:2122
>>>>>>>> #3  0x00007f2943ae777d in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>>>>>>> #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at /root/colo/jan-2016/qemu/async.c:92
>>>>>>>> #5  0x00007f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>>>>> #6  0x00007f2943ae756e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
>>>>>>>>      user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>>>>>>> #7  0x00007f293b84a79a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>>>>>>> #8  0x00007f2943af3a00 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>>>>>>> #9  os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>>>>>>> #10 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>>>>>>> #11 0x00007f29438529ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>>>>>>> #12 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>>>>
>>>>>>>> (gdb) p s->num_children
>>>>>>>> $1 = 2
>>>>>>>> (gdb) p acb->success_count
>>>>>>>> $2 = 0
>>>>>>>> (gdb) p acb->is_read
>>>>>>>> $5 = false
>>>>>>>
>>>>>>> Sorry for the late reply.
>>>>>>
>>>>>> No problem.
>>>>>>
>>>>>>> What it the value of acb->count?
>>>>>>
>>>>>> (gdb) p acb->count
>>>>>> $1 = 1
>>>>>
>>>>> Note, the count is 1, not 2. Writing to children.0 is in flight. If writing to children.0 successes,
>>>>> the guest doesn't know this error.
>>>>>>> If secondary host is down, you should remove quorum's children.1. Otherwise, you will get
>>>>>>> I/O error event.
>>>>>>
>>>>>> Is that safe?  If the secondary fails, do you always have time to issue the command to
>>>>>> remove the children.1  before the guest sees the error?
>>>>>
>>>>> We will write to two children, and expect that writing to children.0 will success. If so,
>>>>> the guest doesn't know this error. You just get the I/O error event.
>>>>
>>>> I think children.0 is the disk, and that should be OK - so only the children.1/replication should
>>>> be failing - so in that case why do I see the error?
>>>
>>> I don't know, and I will check the codes.
>>>
>>>> The 'node0' in the backtrace above is the name of the replication, so it does look like the error
>>>> is coming from the replication.
>>>
>>> No, the backtrace is just report an I/O error events to the management application.
>>>
>>>>
>>>>>> Anyway, I tried removing children.1 but it segfaults now, I guess the replication is unhappy:
>>>>>>
>>>>>> (qemu) x_block_change colo-disk0 -d children.1
>>>>>> (qemu) x_colo_lost_heartbeat
>>>>>
>>>>> Hmm, you should not remove the child before failover. I will check it how to avoid it in the codes.
>>>>
>>>>   But you said 'If secondary host is down, you should remove quorum's children.1' - is that not
>>>> what you meant?
>>>
>>> Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 'x_block_change ... -d ...'.
>>>
>> Hi david
>
> Hi Xie,
>    Thanks for the response.
>
>> It seems we missed 'drive_del' command, and will document it in next
>> version. Here is the right commands order:
>>
>> { "execute": "x-colo-lost-heartbeat" }
>> { 'execute': 'x-blockdev-change', 'arguments': {'parent': 'colo-disk',
>> 'child': 'children.1'}}
>> { 'execute': 'human-monitor-command', 'arguments': {'command-line':
>> 'drive_del xxxxx'}}
>
> OK,  however, you should fix the seg fault if you don't issue the drive_del;
> qemu should never crash.
> (Also I still get the IO error in the guest if I do the x-colo-lost-heartbeat).
>

Here is a quick fix, i just tested for several times. It work well to me.

     bugfix

     Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>

diff --git a/block/quorum.c b/block/quorum.c
index e5a7e4f..f4f1d28 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -458,6 +458,11 @@ static QuorumVoteVersion 
*quorum_get_vote_winner(QuorumVotes *votes)
          if (candidate->vote_count > max) {
              max = candidate->vote_count;
              winner = candidate;
+            continue;
+        }
+        if (candidate->vote_count == max &&
+                    candidate->value.l > winner->value.l) {
+            winner = candidate;
          }
      }


> Dave
>
>> Thanks
>> 	-Xie
>>>>
>>>>>> 12973 Segmentation fault      (core dumped) ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name debug-threads=on -trace events=trace-file -device virtio-rng-pci $block_param $net_param
>>>>>>
>>>>>> #0  0x00007f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, failover=true, errp=0x7fff6a5c3420)
>>>>>>      at /root/colo/jan-2016/qemu/block.c:4426
>>>>>>
>>>>>> (gdb) p drv
>>>>>> $1 = (BlockDriver *) 0x5d2a
>>>>>>
>>>>>>    it looks like the whole of bs is bogus.
>>>>>>
>>>>>> #1  0x00007f0a398d87f6 in quorum_stop_replication (bs=<optimized out>, failover=<optimized out>,
>>>>>>      errp=<optimized out>) at /root/colo/jan-2016/qemu/block/quorum.c:1213
>>>>>>
>>>>>> (gdb) p s->replication_index
>>>>>> $3 = 1
>>>>>>
>>>>>> I guess quorum_del_child needs to stop replication before it removes the child?
>>>>>
>>>>> Yes, but in the newest version, quorum doesn't know the block replication, and I think
>>>>> we shoud add an reference to the bs when starting block replication.
>>>>
>>>> Do you have a new version ready to test?  I'm interested to try it (and also interested
>>>> to try the latest version of the colo-proxy)
>>>
>>> I think we can post the newest version this week.
>>>
>>> Thanks
>>> Wen Congyang
>>>
>>>>
>>>> Dave
>>>>
>>>>> Thanks
>>>>> Wen Congyang
>>>>>
>>>>>> (although it would have to be careful not to block on the dead nbd).
>>>>>>
>>>>>> #2  0x00007f0a398a8901 in bdrv_stop_replication_all (failover=failover@entry=true, errp=errp@entry=0x7fff6a5c3478)
>>>>>>      at /root/colo/jan-2016/qemu/block.c:4504
>>>>>> #3  0x00007f0a3984b0af in primary_vm_do_failover () at /root/colo/jan-2016/qemu/migration/colo.c:144
>>>>>> #4  colo_do_failover (s=<optimized out>) at /root/colo/jan-2016/qemu/migration/colo.c:162
>>>>>> #5  0x00007f0a3989d7fd in aio_bh_call (bh=<optimized out>) at /root/colo/jan-2016/qemu/async.c:64
>>>>>> #6  aio_bh_poll (ctx=ctx@entry=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/async.c:92
>>>>>> #7  0x00007f0a398ab110 in aio_dispatch (ctx=0x7f0a3a6c21d0) at /root/colo/jan-2016/qemu/aio-posix.c:305
>>>>>> #8  0x00007f0a3989d5ee in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>,
>>>>>>      user_data=<optimized out>) at /root/colo/jan-2016/qemu/async.c:231
>>>>>> #9  0x00007f0a3160079a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>>>>> #10 0x00007f0a398a9a80 in glib_pollfds_poll () at /root/colo/jan-2016/qemu/main-loop.c:211
>>>>>> #11 os_host_main_loop_wait (timeout=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:256
>>>>>> #12 main_loop_wait (nonblocking=<optimized out>) at /root/colo/jan-2016/qemu/main-loop.c:504
>>>>>> #13 0x00007f0a396089ee in main_loop () at /root/colo/jan-2016/qemu/vl.c:1945
>>>>>> #14 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /root/colo/jan-2016/qemu/vl.c:4707
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>>> Thanks
>>>>>>> Wen Congyang
>>>>>>>
>>>>>>>>
>>>>>>>> (qemu) info block
>>>>>>>> colo-disk0 (#block080): json:{"children": [{"driver": "raw", "file": {"driver": "file", "filename": "/root/colo/bugzilla.raw"}}, {"driver": "replication", "mode": "primary", "file": {"port": "8889", "host": "ibpair", "driver": "nbd", "export": "colo-disk0"}}], "driver": "quorum", "blkverify": false, "rewrite-corrupted": false, "vote-threshold": 1} (quorum)
>>>>>>>>      Cache mode:       writeback, direct
>>>>>>>>
>>>>>>>> Dave
>>>>>>>>
>>>>>>>> * Changlong Xie (xiecl.fnst@cn.fujitsu.com) wrote:
>>>>>>>>> Block replication is a very important feature which is used for
>>>>>>>>> continuous checkpoints(for example: COLO).
>>>>>>>>>
>>>>>>>>> You can get the detailed information about block replication from here:
>>>>>>>>> http://wiki.qemu.org/Features/BlockReplication
>>>>>>>>>
>>>>>>>>> Usage:
>>>>>>>>> Please refer to docs/block-replication.txt
>>>>>>>>>
>>>>>>>>> This patch series is based on the following patch series:
>>>>>>>>> 1. http://lists.nongnu.org/archive/html/qemu-devel/2015-12/msg04570.html
>>>>>>>>>
>>>>>>>>> You can get the patch here:
>>>>>>>>> https://github.com/Pating/qemu/tree/changlox/block-replication-v13
>>>>>>>>>
>>>>>>>>> You can get the patch with framework here:
>>>>>>>>> https://github.com/Pating/qemu/tree/changlox/colo_framework_v12
>>>>>>>>>
>>>>>>>>> TODO:
>>>>>>>>> 1. Continuous block replication. It will be started after basic functions
>>>>>>>>>     are accepted.
>>>>>>>>>
>>>>>>>>> Changs Log:
>>>>>>>>> V13:
>>>>>>>>> 1. Rebase to the newest codes
>>>>>>>>> 2. Remove redundant marcos and semicolon in replication.c
>>>>>>>>> 3. Fix typos in block-replication.txt
>>>>>>>>> V12:
>>>>>>>>> 1. Rebase to the newest codes
>>>>>>>>> 2. Use backing reference to replcace 'allow-write-backing-file'
>>>>>>>>> V11:
>>>>>>>>> 1. Reopen the backing file when starting blcok replication if it is not
>>>>>>>>>     opened in R/W mode
>>>>>>>>> 2. Unblock BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET
>>>>>>>>>     when opening backing file
>>>>>>>>> 3. Block the top BDS so there is only one block job for the top BDS and
>>>>>>>>>     its backing chain.
>>>>>>>>> V10:
>>>>>>>>> 1. Use blockdev-remove-medium and blockdev-insert-medium to replace backing
>>>>>>>>>     reference.
>>>>>>>>> 2. Address the comments from Eric Blake
>>>>>>>>> V9:
>>>>>>>>> 1. Update the error messages
>>>>>>>>> 2. Rebase to the newest qemu
>>>>>>>>> 3. Split child add/delete support. These patches are sent in another patchset.
>>>>>>>>> V8:
>>>>>>>>> 1. Address Alberto Garcia's comments
>>>>>>>>> V7:
>>>>>>>>> 1. Implement adding/removing quorum child. Remove the option non-connect.
>>>>>>>>> 2. Simplify the backing refrence option according to Stefan Hajnoczi's suggestion
>>>>>>>>> V6:
>>>>>>>>> 1. Rebase to the newest qemu.
>>>>>>>>> V5:
>>>>>>>>> 1. Address the comments from Gong Lei
>>>>>>>>> 2. Speed the failover up. The secondary vm can take over very quickly even
>>>>>>>>>     if there are too many I/O requests.
>>>>>>>>> V4:
>>>>>>>>> 1. Introduce a new driver replication to avoid touch nbd and qcow2.
>>>>>>>>> V3:
>>>>>>>>> 1: use error_setg() instead of error_set()
>>>>>>>>> 2. Add a new block job API
>>>>>>>>> 3. Active disk, hidden disk and nbd target uses the same AioContext
>>>>>>>>> 4. Add a testcase to test new hbitmap API
>>>>>>>>> V2:
>>>>>>>>> 1. Redesign the secondary qemu(use image-fleecing)
>>>>>>>>> 2. Use Error objects to return error message
>>>>>>>>> 3. Address the comments from Max Reitz and Eric Blake
>>>>>>>>>
>>>>>>>>> Wen Congyang (10):
>>>>>>>>>    unblock backup operations in backing file
>>>>>>>>>    Store parent BDS in BdrvChild
>>>>>>>>>    Backup: clear all bitmap when doing block checkpoint
>>>>>>>>>    Allow creating backup jobs when opening BDS
>>>>>>>>>    docs: block replication's description
>>>>>>>>>    Add new block driver interfaces to control block replication
>>>>>>>>>    quorum: implement block driver interfaces for block replication
>>>>>>>>>    Implement new driver for block replication
>>>>>>>>>    support replication driver in blockdev-add
>>>>>>>>>    Add a new API to start/stop replication, do checkpoint to all BDSes
>>>>>>>>>
>>>>>>>>>   block.c                    | 145 ++++++++++++
>>>>>>>>>   block/Makefile.objs        |   3 +-
>>>>>>>>>   block/backup.c             |  14 ++
>>>>>>>>>   block/quorum.c             |  78 +++++++
>>>>>>>>>   block/replication.c        | 545 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>   blockjob.c                 |  11 +
>>>>>>>>>   docs/block-replication.txt | 227 +++++++++++++++++++
>>>>>>>>>   include/block/block.h      |   9 +
>>>>>>>>>   include/block/block_int.h  |  15 ++
>>>>>>>>>   include/block/blockjob.h   |  12 +
>>>>>>>>>   qapi/block-core.json       |  33 ++-
>>>>>>>>>   11 files changed, 1089 insertions(+), 3 deletions(-)
>>>>>>>>>   create mode 100644 block/replication.c
>>>>>>>>>   create mode 100644 docs/block-replication.txt
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> 1.9.3
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>>>
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>
>>>>
>>>> .
>>>>
>>>
>>> .
>>>
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> .
>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-02-04 10:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-25 10:30 [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 01/10] unblock backup operations in backing file Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 02/10] Store parent BDS in BdrvChild Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 03/10] Backup: clear all bitmap when doing block checkpoint Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 04/10] Allow creating backup jobs when opening BDS Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 05/10] docs: block replication's description Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 06/10] Add new block driver interfaces to control block replication Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 07/10] quorum: implement block driver interfaces for " Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 08/10] Implement new driver " Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 09/10] support replication driver in blockdev-add Changlong Xie
2015-12-25 10:30 ` [Qemu-devel] [PATCH v13 10/10] Add a new API to start/stop replication, do checkpoint to all BDSes Changlong Xie
2016-01-22 15:14 ` [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints Dr. David Alan Gilbert
2016-01-25  1:06   ` Wen Congyang
2016-01-25 12:10     ` Dr. David Alan Gilbert
2016-01-25  1:20   ` Wen Congyang
2016-01-25 11:56     ` Dr. David Alan Gilbert
2016-01-27 11:03 ` Dr. David Alan Gilbert
2016-01-29  6:52   ` Wen Congyang
2016-01-29 10:07     ` Dr. David Alan Gilbert
2016-01-29 10:27       ` Wen Congyang
2016-01-29 10:47         ` Dr. David Alan Gilbert
2016-02-01  1:18           ` Wen Congyang
2016-02-01 10:18             ` Dr. David Alan Gilbert
2016-02-04  2:32             ` Changlong Xie
2016-02-04  9:07               ` Dr. David Alan Gilbert
2016-02-04  9:16                 ` Wen Congyang
2016-02-04 10:17                 ` Changlong Xie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.