All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints
@ 2015-06-18  8:49 Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description Wen Congyang
                   ` (15 more replies)
  0 siblings, 16 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Stefan Hajnoczi, Yang Hongyang

Block replication is a very important feature which is used for
continuous checkpoints(for example: COLO).

Usage:
Please refer to docs/block-replication.txt

You can get the patch here:
https://github.com/wencongyang/qemu-colo/commits/block-replication-v6

The other newest COLO patchse will be sent soon.

Note: you should apply the following patch first:
http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg01317.html

TODO:
1. Continuous block replication. It will be started after basic functions
   are accepted.

Changs Log:
V6:
1. Rebase to the newest qemu.
V5:
1. Address the comments from Gong Lei
2. Speed the failover up. The secondary vm can take over very quickly even
   if there are too many I/O requests.
V4:
1. Introduce a new driver replication to avoid touch nbd and qcow2.
V3:
1: use error_setg() instead of error_set()
2. Add a new block job API
3. Active disk, hidden disk and nbd target uses the same AioContext
4. Add a testcase to test new hbitmap API
V2:
1. Redesign the secondary qemu(use image-fleecing)
2. Use Error objects to return error message
3. Address the comments from Max Reitz and Eric Blake

Wen Congyang (16):
  docs: block replication's description
  allow writing to the backing file
  Allow creating backup jobs when opening BDS
  block: Parse "backing_reference" option to reference existing BDS
  Backup: clear all bitmap when doing block checkpoint
  Don't allow a disk use backing reference target
  Add new block driver interface to connect/disconnect the remote target
  NBD client: implement block driver interfaces to connect/disconnect
    NBD server
  Introduce a new -drive option to control whether to connect to remote
    target
  NBD client: connect to nbd server later
  Add new block driver interfaces to control block replication
  skip nbd_target when starting block replication
  quorum: implement block driver interfaces for block replication
  introduce a new API qemu_opts_absorb_qdict_by_index()
  quorum: allow ignoring child errors
  Implement new driver for block replication

 block.c                    | 279 +++++++++++++++++++++++++++-
 block/Makefile.objs        |   3 +-
 block/backup.c             |  13 ++
 block/nbd.c                |  69 +++++--
 block/quorum.c             | 162 ++++++++++++++++-
 block/replication.c        | 441 +++++++++++++++++++++++++++++++++++++++++++++
 blockdev.c                 |   8 +
 blockjob.c                 |  10 +
 docs/block-replication.txt | 179 ++++++++++++++++++
 include/block/block.h      |  10 +
 include/block/block_int.h  |  18 ++
 include/block/blockjob.h   |  12 ++
 include/qemu/option.h      |   2 +
 qapi/block.json            |  16 ++
 qemu-options.hx            |   4 +
 tests/qemu-iotests/051     |  13 ++
 tests/qemu-iotests/051.out |  13 ++
 util/qemu-option.c         |  44 +++++
 18 files changed, 1266 insertions(+), 30 deletions(-)
 create mode 100644 block/replication.c
 create mode 100644 docs/block-replication.txt

-- 
2.4.3

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18 10:34   ` Stefan Hajnoczi
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 02/16] allow writing to the backing file Wen Congyang
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 docs/block-replication.txt | 179 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)
 create mode 100644 docs/block-replication.txt

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
new file mode 100644
index 0000000..a29f51a
--- /dev/null
+++ b/docs/block-replication.txt
@@ -0,0 +1,179 @@
+Block replication
+----------------------------------------
+Copyright Fujitsu, Corp. 2015
+Copyright (c) 2015 Intel Corporation
+Copyright (c) 2015 HUAWEI TECHNOLOGIES CO., LTD.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+Block replication is used for continuous checkpoints. It is designed
+for COLO (COurse-grain LOck-stepping) where the Secondary VM is running.
+It can also be applied for FT/HA (Fault-tolerance/High Assurance) scenario,
+where the Secondary VM is not running.
+
+This document gives an overview of block replication's design.
+
+== Background ==
+High availability solutions such as micro checkpoint and COLO will do
+consecutive checkpoints. The VM state of Primary VM and Secondary VM is
+identical right after a VM checkpoint, but becomes different as the VM
+executes till the next checkpoint. To support disk contents checkpoint,
+the modified disk contents in the Secondary VM must be buffered, and are
+only dropped at next checkpoint time. To reduce the network transportation
+effort at the time of checkpoint, the disk modification operations of
+Primary disk are asynchronously forwarded to the Secondary node.
+
+== Workflow ==
+The following is the image of block replication workflow:
+
+        +----------------------+            +------------------------+
+        |Primary Write Requests|            |Secondary Write Requests|
+        +----------------------+            +------------------------+
+                  |                                       |
+                  |                                      (4)
+                  |                                       V
+                  |                              /-------------\
+                  |      Copy and Forward        |             |
+                  |---------(1)----------+       | Disk Buffer |
+                  |                      |       |             |
+                  |                     (3)      \-------------/
+                  |                 speculative      ^
+                  |                write through    (2)
+                  |                      |           |
+                  V                      V           |
+           +--------------+           +----------------+
+           | Primary Disk |           | Secondary Disk |
+           +--------------+           +----------------+
+
+    1) Primary write requests will be copied and forwarded to Secondary
+       QEMU.
+    2) Before Primary write requests are written to Secondary disk, the
+       original sector content will be read from Secondary disk and
+       buffered in the Disk buffer, but it will not overwrite the existing
+       sector content(it could be from either "Secondary Write Requests" or
+       previous COW of "Primary Write Requests") in the Disk buffer.
+    3) Primary write requests will be written to Secondary disk.
+    4) Secondary write requests will be buffered in the Disk buffer and it
+       will overwrite the existing sector content in the buffer.
+
+== Architecture ==
+We are going to implement block replication from many basic
+blocks that are already in QEMU.
+
+         virtio-blk       ||
+             ^            ||                            .----------
+             |            ||                            | Secondary
+        1 Quorum          ||                            '----------
+         /      \         ||
+        /        \        ||
+   Primary    2 filter
+     disk         ^                                                             virtio-blk
+                  |                                                                  ^
+                3 NBD  ------->  3 NBD                                               |
+                client    ||     server                                          2 filter
+                          ||        ^                                                ^
+--------.                 ||        |                                                |
+Primary |                 ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
+--------'                 ||        |          backing        ^       backing
+                          ||        |                         |
+                          ||        |                         |
+                          ||        '-------------------------'
+                          ||           drive-backup sync=none
+
+1) The disk on the primary is represented by a block device with two
+children, providing replication between a primary disk and the host that
+runs the secondary VM. The read pattern for quorum can be extended to
+make the primary always read from the local disk instead of going through
+NBD.
+
+2) The new block filter(the name is replication) will control the block
+replication.
+
+3) The secondary disk receives writes from the primary VM through QEMU's
+embedded NBD server (speculative write-through).
+
+4) The disk on the secondary is represented by a custom block device
+(called active-disk). It should be an empty disk, and the format should
+support bdrv_make_empty() and backing file.
+
+5) The hidden-disk is created automatically. It buffers the original content
+that is modified by the primary VM. It should also be an empty disk, and
+the driver supports bdrv_make_empty() and backing file.
+
+== Failure Handling ==
+There are 6 internal errors when block replication is running:
+1. I/O error on primary disk
+2. Forwarding primay write requests failed
+3. Bacup failed or writing to secondary disk failed
+4. I/O error on secondary disk
+5. I/O error on active disk
+6. Making active disk or hidden disk empty failed
+In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
+4 and 6, we just report block replication's error to FT/HA manager(which
+decides when to do a new checkpoint, when to do failover).
+There is one internal error when doing failover:
+1. Commiting the data in active disk/hidden disk to secondary disk failed
+We just to report this error to FT/HA manager.
+
+== New block driver interface ==
+We add three block driver interfaces to control block replication:
+a. bdrv_start_replication()
+   Start block replication, called in migration/checkpoint thread.
+   We must call bdrv_start_replication() in secondary QEMU before
+   calling bdrv_start_replication() in primary QEMU. The caller
+   must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+b. bdrv_do_checkpoint()
+   This interface is called after all VM state is transferred to
+   Secondary QEMU. The Disk buffer will be dropped in this interface.
+   The caller must hold the I/O mutex lock if it is in migration/checkpoint
+   thread.
+c. bdrv_stop_replication()
+   It is called on failover. We will flush the Disk buffer into
+   Secondary Disk and stop block replication. The vm should be stopped
+   before calling it. The caller must hold the I/O mutex lock if it is
+   in migration/checkpoint thread.
+
+== Usage ==
+Primary:
+  -drive if=xxx,driver=quorum,read-pattern=fifo,no-connect=on,\
+         children.0.file.filename=1.raw,\
+         children.0.driver=raw,\
+         children.1.file.driver=nbd,\
+         children.1.file.host=xxx,\
+         children.1.file.port=xxx,\
+         children.1.file.export=xxx,\
+         children.1.driver=replication,\
+         children.1.mode=primary,\
+         children.1.ignore-errors=on
+  Note:
+  1. NBD Client should not be the first child of quorum.
+  2. There should be only one NBD Client.
+  3. host is the secondary physical machine's hostname or IP
+  4. Each disk must have its own export name.
+  5. It is all a single argument to -drive, and you should ignore
+     the leading whitespace.
+
+Secondary:
+  -drive if=none,driver=raw,file=1.raw,id=nbd_target1 \
+  -drive if=xxx,driver=replication,mode=secondary,export=xxx,\
+         file.file.filename=active_disk.qcow2,\
+         file.driver=qcow2,\
+         file.backing_reference.drive_id=nbd_target1,\
+         file.backing_reference.hidden-disk.file.filename=hidden_disk.qcow2,\
+         file.backing_reference.hidden-disk.driver=qcow2,\
+         file.backing_reference.hidden-disk.allow-write-backing-file=on
+  Then run qmp command:
+    nbd-server-start host:port
+  Note:
+  1. The export name for the same disk must be the same in primary
+     and secondary QEMU command line
+  2. The qmp command nbd-server-start must be run before running the
+     qmp command migrate on primary QEMU
+  3. Don't use nbd-server-start's other options
+  4. Active disk, hidden disk and nbd target's length should be the
+     same.
+  5. It is better to put active disk and hidden disk in ramdisk.
+  6. It is all a single argument to -drive, and you should ignore
+     the leading whitespace.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 02/16] allow writing to the backing file
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 03/16] Allow creating backup jobs when opening BDS Wen Congyang
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 0ffb855..df4cbce 100644
--- a/block.c
+++ b/block.c
@@ -745,6 +745,15 @@ static const BdrvChildRole child_backing = {
     .inherit_flags = bdrv_backing_flags,
 };
 
+static int bdrv_backing_rw_flags(int flags)
+{
+    return bdrv_backing_flags(flags) | BDRV_O_RDWR;
+}
+
+static const BdrvChildRole child_backing_rw = {
+    .inherit_flags = bdrv_backing_rw_flags,
+};
+
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
     int open_flags = flags | BDRV_O_CACHE_WB;
@@ -1131,6 +1140,20 @@ out:
     bdrv_refresh_limits(bs, NULL);
 }
 
+#define ALLOW_WRITE_BACKING_FILE    "allow-write-backing-file"
+static QemuOptsList backing_file_opts = {
+    .name = "backing_file",
+    .head = QTAILQ_HEAD_INITIALIZER(backing_file_opts.head),
+    .desc = {
+        {
+            .name = ALLOW_WRITE_BACKING_FILE,
+            .type = QEMU_OPT_BOOL,
+            .help = "allow write to backing file",
+        },
+        { /* end of list */ }
+    },
+};
+
 /*
  * Opens the backing file for a BlockDriverState if not yet open
  *
@@ -1145,6 +1168,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     int ret = 0;
     BlockDriverState *backing_hd;
     Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    bool child_rw = false;
 
     if (bs->backing_hd != NULL) {
         QDECREF(options);
@@ -1157,6 +1182,18 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     }
 
     bs->open_flags &= ~BDRV_O_NO_BACKING;
+
+    opts = qemu_opts_create(&backing_file_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        error_propagate(errp, local_err);
+        QDECREF(options);
+        goto free_exit;
+    }
+    child_rw = qemu_opt_get_bool(opts, ALLOW_WRITE_BACKING_FILE, false);
+    qemu_opts_del(opts);
+
     if (qdict_haskey(options, "file.filename")) {
         backing_filename[0] = '\0';
     } else if (bs->backing_file[0] == '\0' && qdict_size(options) == 0) {
@@ -1189,7 +1226,8 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     assert(bs->backing_hd == NULL);
     ret = bdrv_open_inherit(&backing_hd,
                             *backing_filename ? backing_filename : NULL,
-                            NULL, options, 0, bs, &child_backing,
+                            NULL, options, 0, bs,
+                            child_rw ? &child_backing_rw : &child_backing,
                             NULL, &local_err);
     if (ret < 0) {
         bdrv_unref(backing_hd);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 03/16] Allow creating backup jobs when opening BDS
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 02/16] allow writing to the backing file Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18 10:40   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jeff Cody, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Yang Hongyang, zhanghailiang

When opening BDS, we need to create backup jobs for
image-fleecing.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index c34fd7c..f068666 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -22,10 +22,10 @@ block-obj-$(CONFIG_ARCHIPELAGO) += archipelago.o
 block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
+block-obj-y += backup.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
-common-obj-y += backup.o
 
 iscsi.o-cflags     := $(LIBISCSI_CFLAGS)
 iscsi.o-libs       := $(LIBISCSI_LIBS)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (2 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 03/16] Allow creating backup jobs when opening BDS Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18 10:50   ` Stefan Hajnoczi
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 05/16] Backup: clear all bitmap when doing block checkpoint Wen Congyang
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Usage:
-drive file=xxx,id=Y, \
-drive file=xxxx,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*

It will create such backing chain:
               {virtio-blk dev 'Y'}                                      {virtio-blk dev 'X'}
                         |                                                          |
                         |                                                          |
                         v                                                          v

    [base] <- [mid] <- ( Y )  <----------------- (hidden target) <--------------- ( X )

                         v                              ^
                         v                              ^
                         v                              ^
                         v                              ^
                         >>>> drive-backup sync=none >>>>

X's backing file is hidden-disk, and hidden-disk's backing file is Y.
Disk Y may be opened or reopened in read-write mode, so A block backup
job is automatically created: source is Y and target is hidden disk.
Active disk X, hidden disk, and Y are all on the same AioContext.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c                    | 154 ++++++++++++++++++++++++++++++++++++++++++++-
 include/block/block.h      |   1 +
 include/block/block_int.h  |   1 +
 tests/qemu-iotests/051     |  13 ++++
 tests/qemu-iotests/051.out |  13 ++++
 5 files changed, 179 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index df4cbce..d1ed227 100644
--- a/block.c
+++ b/block.c
@@ -1245,6 +1245,119 @@ free_exit:
     return ret;
 }
 
+static void backing_reference_completed(void *opaque, int ret)
+{
+    BlockDriverState *hidden_disk = opaque;
+
+    assert(!hidden_disk->backing_reference);
+}
+
+static int bdrv_open_backing_reference_file(BlockDriverState *bs,
+                                            QDict *options, Error **errp)
+{
+    const char *backing_name;
+    QDict *hidden_disk_options = NULL;
+    BlockDriverState *backing_hd, *hidden_disk;
+    BlockBackend *backing_blk;
+    AioContext *aio_context;
+    Error *local_err = NULL;
+    int ret = 0;
+
+    backing_name = qdict_get_try_str(options, "drive_id");
+    if (!backing_name) {
+        error_setg(errp, "Backing reference needs option drive_id");
+        ret = -EINVAL;
+        goto free_exit;
+    }
+    qdict_del(options, "drive_id");
+
+    qdict_extract_subqdict(options, &hidden_disk_options, "hidden-disk.");
+    if (!qdict_size(hidden_disk_options)) {
+        error_setg(errp, "Backing reference needs option hidden-disk.*");
+        ret = -EINVAL;
+        goto free_exit;
+    }
+
+    if (qdict_size(options)) {
+        const QDictEntry *entry = qdict_first(options);
+        error_setg(errp, "Backing reference used by '%s' doesn't support "
+                   "the option '%s'", bdrv_get_device_name(bs), entry->key);
+        ret = -EINVAL;
+        goto free_exit;
+    }
+
+    backing_blk = blk_by_name(backing_name);
+    if (!backing_blk) {
+        error_setg(errp, "Device '%s' not found", backing_name);
+        ret = -ENOENT;
+        goto free_exit;
+    }
+
+    backing_hd = blk_bs(backing_blk);
+    /* Backing reference itself? */
+    if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
+        error_setg(errp, "Backing reference itself");
+        ret = -EINVAL;
+        goto free_exit;
+    }
+
+    if (bdrv_op_is_blocked(backing_hd, BLOCK_OP_TYPE_BACKING_REFERENCE,
+                           errp)) {
+        ret = -EBUSY;
+        goto free_exit;
+    }
+
+    /* hidden-disk is bs's backing file */
+    ret = bdrv_open_backing_file(bs, hidden_disk_options, errp);
+    hidden_disk_options = NULL;
+    if (ret < 0) {
+        goto free_exit;
+    }
+
+    hidden_disk = bs->backing_hd;
+    if (!hidden_disk->drv || !hidden_disk->drv->supports_backing) {
+        ret = -EINVAL;
+        error_setg(errp, "Hidden disk's driver doesn't support backing files");
+        goto free_exit;
+    }
+
+    bdrv_set_backing_hd(hidden_disk, backing_hd);
+    bdrv_ref(backing_hd);
+
+    /*
+     * backing hd may be opened or reopened in read-write mode, so we
+     * should backup backing hd to hidden disk
+     */
+    bdrv_op_unblock(hidden_disk, BLOCK_OP_TYPE_BACKUP_TARGET,
+                    bs->backing_blocker);
+    bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_SOURCE,
+                    hidden_disk->backing_blocker);
+
+    bdrv_ref(hidden_disk);
+
+    aio_context = bdrv_get_aio_context(bs);
+    aio_context_acquire(aio_context);
+    bdrv_set_aio_context(backing_hd, aio_context);
+    backup_start(backing_hd, hidden_disk, 0, MIRROR_SYNC_MODE_NONE, NULL,
+                 BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+                 backing_reference_completed, hidden_disk, &local_err);
+    aio_context_release(aio_context);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        bdrv_unref(hidden_disk);
+        /* FIXME, use which errno? */
+        ret = -EIO;
+        goto free_exit;
+    }
+
+    bs->backing_reference = true;
+
+free_exit:
+    QDECREF(hidden_disk_options);
+    QDECREF(options);
+    return ret;
+}
+
 /*
  * Opens a disk image whose options are given as BlockdevRef in another block
  * device's options.
@@ -1527,13 +1640,37 @@ static int bdrv_open_inherit(BlockDriverState **pbs, const char *filename,
 
     /* If there is a backing file, use it */
     if ((flags & BDRV_O_NO_BACKING) == 0) {
-        QDict *backing_options;
+        QDict *backing_options, *backing_reference_options;
 
+        qdict_extract_subqdict(options, &backing_reference_options,
+                               "backing_reference.");
         qdict_extract_subqdict(options, &backing_options, "backing.");
-        ret = bdrv_open_backing_file(bs, backing_options, &local_err);
-        if (ret < 0) {
+
+        if (qdict_size(backing_reference_options) &&
+            qdict_size(backing_options)) {
+            error_setg(&local_err,
+                       "Option \"backing_reference.*\" and \"backing.*\""
+                       " cannot be used together");
+            ret = -EINVAL;
+            QDECREF(backing_reference_options);
+            QDECREF(backing_options);
             goto close_and_fail;
         }
+        if (qdict_size(backing_reference_options)) {
+            QDECREF(backing_options);
+            ret = bdrv_open_backing_reference_file(bs,
+                                                   backing_reference_options,
+                                                   &local_err);
+            if (ret) {
+                goto close_and_fail;
+            }
+        } else {
+            QDECREF(backing_reference_options);
+            ret = bdrv_open_backing_file(bs, backing_options, &local_err);
+            if (ret < 0) {
+                goto close_and_fail;
+            }
+        }
     }
 
     bdrv_refresh_filename(bs);
@@ -1895,6 +2032,14 @@ void bdrv_close(BlockDriverState *bs)
 
         if (bs->backing_hd) {
             BlockDriverState *backing_hd = bs->backing_hd;
+            if (bs->backing_reference) {
+                assert(backing_hd->backing_hd);
+                if (backing_hd->backing_hd->job) {
+                    block_job_cancel(backing_hd->backing_hd->job);
+                }
+                bdrv_set_backing_hd(backing_hd, NULL);
+                bdrv_unref(backing_hd->backing_hd);
+            }
             bdrv_set_backing_hd(bs, NULL);
             bdrv_unref(backing_hd);
         }
@@ -3934,6 +4079,9 @@ void bdrv_attach_aio_context(BlockDriverState *bs,
 
     bs->aio_context = new_context;
 
+    if (bs->backing_reference) {
+        bdrv_attach_aio_context(bs->backing_hd->backing_hd, new_context);
+    }
     if (bs->backing_hd) {
         bdrv_attach_aio_context(bs->backing_hd, new_context);
     }
diff --git a/include/block/block.h b/include/block/block.h
index 07bb724..7cdb569 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -168,6 +168,7 @@ typedef enum BlockOpType {
     BLOCK_OP_TYPE_RESIZE,
     BLOCK_OP_TYPE_STREAM,
     BLOCK_OP_TYPE_REPLACE,
+    BLOCK_OP_TYPE_BACKING_REFERENCE,
     BLOCK_OP_TYPE_MAX,
 } BlockOpType;
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 888ec09..87fe89a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -380,6 +380,7 @@ struct BlockDriverState {
     QDict *full_open_options;
     char exact_filename[PATH_MAX];
 
+    bool backing_reference;
     BlockDriverState *backing_hd;
     BlockDriverState *file;
 
diff --git a/tests/qemu-iotests/051 b/tests/qemu-iotests/051
index 4a8055b..2aa3060 100755
--- a/tests/qemu-iotests/051
+++ b/tests/qemu-iotests/051
@@ -116,6 +116,19 @@ run_qemu -drive file="$TEST_IMG",file.backing.driver=file,file.backing.filename=
 run_qemu -drive file="$TEST_IMG",file.backing.driver=qcow2,file.backing.file.filename="$TEST_IMG.orig"
 
 echo
+echo === Backing file reference ===
+echo
+
+run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
+    -drive file="$TEST_IMG",driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename="$TEST_IMG.hidden"
+
+run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
+    -drive file="$TEST_IMG",driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename="$TEST_IMG.hidden",backing.file.filename="$TEST_IMG.orig"
+
+run_qemu -drive file="$TEST_IMG",if=none,id=drive0 \
+    -drive file="$TEST_IMG",driver=qcow2,file.backing_reference.drive_id=drive0,file.backing_reference.hidden-disk.filename="$TEST_IMG.hidden",file.backing.file.filename="$TEST_IMG.orig"
+
+echo
 echo === Enable and disable lazy refcounting on the command line, plus some invalid values ===
 echo
 
diff --git a/tests/qemu-iotests/051.out b/tests/qemu-iotests/051.out
index 652dd63..a33d102 100644
--- a/tests/qemu-iotests/051.out
+++ b/tests/qemu-iotests/051.out
@@ -75,6 +75,19 @@ Testing: -drive file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.fil
 QEMU_PROG: -drive file=TEST_DIR/t.qcow2,file.backing.driver=qcow2,file.backing.file.filename=TEST_DIR/t.qcow2.orig: Driver doesn't support backing files
 
 
+=== Backing file reference ===
+
+Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) q^[[K^[[Dqu^[[K^[[D^[[Dqui^[[K^[[D^[[D^[[Dquit^[[K
+
+Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing_reference.drive_id=drive0,backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden,backing.file.filename=TEST_DIR/t.qcow2.orig
+QEMU_PROG: -drive file=TEST_DIR/t.qcow2,driver=qcow2,backing=drive0,backing.file.filename=TEST_DIR/t.qcow2.orig: Option "backing_reference.*" and "backing.*" cannot be used together
+
+Testing: -drive file=TEST_DIR/t.qcow2,if=none,id=drive0 -drive file=TEST_DIR/t.qcow2,driver=qcow2,file.backing_reference.drive_id=drive0,file.backing_reference.hidden-disk.filename=TEST_DIR/t.qcow2.hidden,file.backing.file.filename=TEST_DIR/t.qcow2.orig
+QEMU_PROG: -drive file=TEST_DIR/t.qcow2,driver=qcow2,file.backing=drive0,file.backing.file.filename=TEST_DIR/t.qcow2.orig: Option "backing_reference.*" and "backing.*" cannot be used together
+
+
 === Enable and disable lazy refcounting on the command line, plus some invalid values ===
 
 Testing: -drive file=TEST_DIR/t.qcow2,format=qcow2,lazy-refcounts=on
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 05/16] Backup: clear all bitmap when doing block checkpoint
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (3 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target Wen Congyang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jeff Cody, Jiang Yunhong,
	Dong Eddie, Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi,
	Yang Hongyang, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Jeff Cody <jcody@redhat.com>
---
 block/backup.c           | 13 +++++++++++++
 blockjob.c               | 10 ++++++++++
 include/block/blockjob.h | 12 ++++++++++++
 3 files changed, 35 insertions(+)

diff --git a/block/backup.c b/block/backup.c
index d3f648d..d3d8ba7 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -210,11 +210,24 @@ static void backup_iostatus_reset(BlockJob *job)
     bdrv_iostatus_reset(s->target);
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+        error_setg(errp, "this feature or command is not currently supported");
+        return;
+    }
+
+    hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size  = sizeof(BackupBlockJob),
     .job_type       = BLOCK_JOB_TYPE_BACKUP,
     .set_speed      = backup_set_speed,
     .iostatus_reset = backup_iostatus_reset,
+    .do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index 2755465..9d2128a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -399,3 +399,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
 
     qemu_bh_schedule(data->bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+    if (!job->driver->do_checkpoint) {
+        error_setg(errp, "this feature or command is not currently supported");
+        return;
+    }
+
+    job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 57d8ef1..b832dc3 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@ typedef struct BlockJobDriver {
      * manually.
      */
     void (*complete)(BlockJob *job, Error **errp);
+
+    /** Optional callback for job types that support checkpoint. */
+    void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -348,4 +351,13 @@ void block_job_defer_to_main_loop(BlockJob *job,
                                   BlockJobDeferToMainLoopFn *fn,
                                   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (4 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 05/16] Backup: clear all bitmap when doing block checkpoint Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18 12:47   ` Stefan Hajnoczi
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target Wen Congyang
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block.c b/block.c
index d1ed227..0b41af4 100644
--- a/block.c
+++ b/block.c
@@ -1294,6 +1294,14 @@ static int bdrv_open_backing_reference_file(BlockDriverState *bs,
     }
 
     backing_hd = blk_bs(backing_blk);
+    /* Don't allow a disk use backing reference target */
+    ret = blk_attach_dev(backing_hd->blk, bs);
+    if (ret < 0) {
+        error_setg(errp, "backing_hd %s is used by the other device model",
+                   backing_name);
+        goto free_exit;
+    }
+
     /* Backing reference itself? */
     if (backing_hd == bs || bdrv_find_overlay(backing_hd, bs)) {
         error_setg(errp, "Backing reference itself");
@@ -2037,6 +2045,7 @@ void bdrv_close(BlockDriverState *bs)
                 if (backing_hd->backing_hd->job) {
                     block_job_cancel(backing_hd->backing_hd->job);
                 }
+                blk_detach_dev(backing_hd->backing_hd->blk, bs);
                 bdrv_set_backing_hd(backing_hd, NULL);
                 bdrv_unref(backing_hd->backing_hd);
             }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (5 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18 12:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 08/16] NBD client: implement block driver interfaces to connect/disconnect NBD server Wen Congyang
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

In some cases, we want to connect/disconnect the remote target when
we need, not in bdrv_open()/bdrv_close().

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c                   | 24 ++++++++++++++++++++++++
 include/block/block.h     |  3 +++
 include/block/block_int.h |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/block.c b/block.c
index 0b41af4..59071d4 100644
--- a/block.c
+++ b/block.c
@@ -4400,3 +4400,27 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs)
 {
     return &bs->stats;
 }
+
+void bdrv_connect(BlockDriverState *bs, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_connect) {
+        drv->bdrv_connect(bs, errp);
+    } else if (bs->file) {
+        bdrv_connect(bs->file, errp);
+    } else {
+        error_setg(errp, "this feature or command is not currently supported");
+    }
+}
+
+void bdrv_disconnect(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_disconnect) {
+        drv->bdrv_disconnect(bs);
+    } else if (bs->file) {
+        bdrv_disconnect(bs->file);
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 7cdb569..2c2a0cc 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -606,4 +606,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs);
 
 BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 
+void bdrv_connect(BlockDriverState *bs, Error **errp);
+void bdrv_disconnect(BlockDriverState *bs);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 87fe89a..a3e5372 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -290,6 +290,9 @@ struct BlockDriver {
      */
     int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
 
+    void (*bdrv_connect)(BlockDriverState *bs, Error **errp);
+    void (*bdrv_disconnect)(BlockDriverState *bs);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 08/16] NBD client: implement block driver interfaces to connect/disconnect NBD server
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (6 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 09/16] Introduce a new -drive option to control whether to connect to remote target Wen Congyang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/nbd.c | 67 ++++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2176186..bc9477a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -44,6 +44,8 @@
 typedef struct BDRVNBDState {
     NbdClientSession client;
     QemuOpts *socket_opts;
+    char *export;
+    bool connected;
 } BDRVNBDState;
 
 static int nbd_parse_uri(const char *filename, QDict *options)
@@ -254,34 +256,56 @@ static int nbd_establish_connection(BlockDriverState *bs, Error **errp)
     return sock;
 }
 
-static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
-                    Error **errp)
+static void nbd_connect_server(BlockDriverState *bs, Error **errp)
 {
     BDRVNBDState *s = bs->opaque;
-    char *export = NULL;
-    int result, sock;
-    Error *local_err = NULL;
-
-    /* Pop the config into our state object. Exit if invalid. */
-    nbd_config(s, options, &export, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        return -EINVAL;
-    }
+    int sock;
 
     /* establish TCP connection, return error if it fails
      * TODO: Configurable retry-until-timeout behaviour.
      */
     sock = nbd_establish_connection(bs, errp);
     if (sock < 0) {
-        g_free(export);
-        return sock;
+        return;
     }
 
     /* NBD handshake */
-    result = nbd_client_init(bs, sock, export, errp);
-    g_free(export);
-    return result;
+    nbd_client_init(bs, sock, s->export, errp);
+
+    s->connected = true;
+}
+
+static void nbd_disconnect_server(BlockDriverState *bs)
+{
+    BDRVNBDState *s = bs->opaque;
+
+    if (s->connected) {
+        nbd_client_close(bs);
+        s->connected = false;
+    }
+}
+
+static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
+                    Error **errp)
+{
+    BDRVNBDState *s = bs->opaque;
+    Error *local_err = NULL;
+
+    /* Pop the config into our state object. Exit if invalid. */
+    nbd_config(s, options, &s->export, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -EINVAL;
+    }
+
+    nbd_connect_server(bs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        g_free(s->export);
+        return -EINVAL;
+    }
+
+    return 0;
 }
 
 static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -318,7 +342,8 @@ static void nbd_close(BlockDriverState *bs)
     BDRVNBDState *s = bs->opaque;
 
     qemu_opts_del(s->socket_opts);
-    nbd_client_close(bs);
+    nbd_disconnect_server(bs);
+    g_free(s->export);
 }
 
 static int64_t nbd_getlength(BlockDriverState *bs)
@@ -400,6 +425,8 @@ static BlockDriver bdrv_nbd = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_connect               = nbd_connect_server,
+    .bdrv_disconnect            = nbd_disconnect_server,
 };
 
 static BlockDriver bdrv_nbd_tcp = {
@@ -418,6 +445,8 @@ static BlockDriver bdrv_nbd_tcp = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_connect               = nbd_connect_server,
+    .bdrv_disconnect            = nbd_disconnect_server,
 };
 
 static BlockDriver bdrv_nbd_unix = {
@@ -436,6 +465,8 @@ static BlockDriver bdrv_nbd_unix = {
     .bdrv_detach_aio_context    = nbd_detach_aio_context,
     .bdrv_attach_aio_context    = nbd_attach_aio_context,
     .bdrv_refresh_filename      = nbd_refresh_filename,
+    .bdrv_connect               = nbd_connect_server,
+    .bdrv_disconnect            = nbd_disconnect_server,
 };
 
 static void bdrv_nbd_init(void)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 09/16] Introduce a new -drive option to control whether to connect to remote target
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (7 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 08/16] NBD client: implement block driver interfaces to connect/disconnect NBD server Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 10/16] NBD client: connect to nbd server later Wen Congyang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 blockdev.c            | 8 ++++++++
 include/block/block.h | 1 +
 qemu-options.hx       | 4 ++++
 3 files changed, 13 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 1cd1b79..07b0477 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -431,6 +431,10 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
         qdict_put(bs_opts, "driver", qstring_from_str(buf));
     }
 
+    if (qemu_opt_get_bool(opts, "no-connect", false)) {
+        bdrv_flags |= BDRV_O_NO_CONNECT;
+    }
+
     /* disk I/O throttling */
     memset(&cfg, 0, sizeof(cfg));
     cfg.buckets[THROTTLE_BPS_TOTAL].avg =
@@ -3214,6 +3218,10 @@ QemuOptsList qemu_common_drive_opts = {
             .name = "detect-zeroes",
             .type = QEMU_OPT_STRING,
             .help = "try to optimize zero writes (off, on, unmap)",
+        },{
+            .name = "no-connect",
+            .type = QEMU_OPT_BOOL,
+            .help = "enable whether to connect remote target"
         },
         { /* end of list */ }
     },
diff --git a/include/block/block.h b/include/block/block.h
index 2c2a0cc..4b3a2b9 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -88,6 +88,7 @@ typedef struct HDGeometry {
 #define BDRV_O_PROTOCOL    0x8000  /* if no block driver is explicitly given:
                                       select an appropriate protocol driver,
                                       ignoring the format layer */
+#define BDRV_O_NO_CONNECT  0x10000 /* do not connect to remote target */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 5438f98..7bdd7b7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -469,6 +469,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [[,iops_max=im]|[[,iops_rd_max=irm][,iops_wr_max=iwm]]]\n"
     "       [[,iops_size=is]]\n"
     "       [[,group=g]]\n"
+    "       [,no-connect=on|off]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -530,6 +531,9 @@ file sectors into the image file.
 conversion of plain zero writes by the OS to driver specific optimized
 zero write commands. You may even choose "unmap" if @var{discard} is set
 to "unmap" to allow a zero write to be converted to an UNMAP operation.
+@item no-connect=@var{no-connect}
+@var{no-connect} is "on" or "off", and enables whether to connect to remote
+target when open the drive. The default value is "off".
 @end table
 
 By default, the @option{cache=writeback} mode is used. It will report data
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 10/16] NBD client: connect to nbd server later
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (8 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 09/16] Introduce a new -drive option to control whether to connect to remote target Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 11/16] Add new block driver interfaces to control block replication Wen Congyang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

The secondary qemu starts later than the primary qemu, so we
cannot connect to nbd server in bdrv_open(). Introduce a new
open flags to control it.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/nbd.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index bc9477a..4964cf8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -298,11 +298,13 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
         return -EINVAL;
     }
 
-    nbd_connect_server(bs, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        g_free(s->export);
-        return -EINVAL;
+    if (!(flags & BDRV_O_NO_CONNECT)) {
+        nbd_connect_server(bs, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            g_free(s->export);
+            return -EINVAL;
+        }
     }
 
     return 0;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 11/16] Add new block driver interfaces to control block replication
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (9 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 10/16] NBD client: connect to nbd server later Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 12/16] skip nbd_target when starting " Wen Congyang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Luiz Capitulino, Gonglei,
	Stefan Hajnoczi, Yang Hongyang, Michael Roth, zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c                   | 40 ++++++++++++++++++++++++++++++++++++++++
 include/block/block.h     |  5 +++++
 include/block/block_int.h | 14 ++++++++++++++
 qapi/block.json           | 16 ++++++++++++++++
 4 files changed, 75 insertions(+)

diff --git a/block.c b/block.c
index 59071d4..06222bf 100644
--- a/block.c
+++ b/block.c
@@ -4424,3 +4424,43 @@ void bdrv_disconnect(BlockDriverState *bs)
         bdrv_disconnect(bs->file);
     }
 }
+
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_start_replication) {
+        drv->bdrv_start_replication(bs, mode, errp);
+    } else if (bs->file) {
+        bdrv_start_replication(bs->file, mode, errp);
+    } else {
+        error_setg(errp, "this feature or command is not currently supported");
+    }
+}
+
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_do_checkpoint) {
+        drv->bdrv_do_checkpoint(bs, errp);
+    } else if (bs->file) {
+        bdrv_do_checkpoint(bs->file, errp);
+    } else {
+        error_setg(errp, "this feature or command is not currently supported");
+    }
+}
+
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BlockDriver *drv = bs->drv;
+
+    if (drv && drv->bdrv_stop_replication) {
+        drv->bdrv_stop_replication(bs, failover, errp);
+    } else if (bs->file) {
+        bdrv_stop_replication(bs->file, failover, errp);
+    } else {
+        error_setg(errp, "this feature or command is not currently supported");
+    }
+}
diff --git a/include/block/block.h b/include/block/block.h
index 4b3a2b9..573d39f 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -610,4 +610,9 @@ BlockAcctStats *bdrv_get_stats(BlockDriverState *bs);
 void bdrv_connect(BlockDriverState *bs, Error **errp);
 void bdrv_disconnect(BlockDriverState *bs);
 
+void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                            Error **errp);
+void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp);
+void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp);
+
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a3e5372..27ff3da 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -293,6 +293,20 @@ struct BlockDriver {
     void (*bdrv_connect)(BlockDriverState *bs, Error **errp);
     void (*bdrv_disconnect)(BlockDriverState *bs);
 
+    void (*bdrv_start_replication)(BlockDriverState *bs, ReplicationMode mode,
+                                   Error **errp);
+    /* Drop Disk buffer when doing checkpoint. */
+    void (*bdrv_do_checkpoint)(BlockDriverState *bs, Error **errp);
+    /*
+     * After failover, we should flush Disk buffer into secondary disk
+     * and stop block replication.
+     *
+     * If the guest is shutdown, we should drop Disk buffer and stop
+     * block representation.
+     */
+    void (*bdrv_stop_replication)(BlockDriverState *bs, bool failover,
+                                  Error **errp);
+
     QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/qapi/block.json b/qapi/block.json
index aad645c..04dc4c2 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -40,6 +40,22 @@
   'data': ['auto', 'none', 'lba', 'large', 'rechs']}
 
 ##
+# @ReplicationMode
+#
+# An enumeration of replication modes.
+#
+# @unprotected: Replication is not started or after failover.
+#
+# @primary: Primary mode, the vm's state will be sent to secondary QEMU.
+#
+# @secondary: Secondary mode, receive the vm's state from primary QEMU.
+#
+# Since: 2.4
+##
+{ 'enum' : 'ReplicationMode',
+  'data' : ['unprotected', 'primary', 'secondary']}
+
+##
 # @BlockdevSnapshotInternal
 #
 # @device: the name of the device to generate the snapshot from
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 12/16] skip nbd_target when starting block replication
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (10 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 11/16] Add new block driver interfaces to control block replication Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 13/16] quorum: implement block driver interfaces for " Wen Congyang
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 06222bf..2108d02 100644
--- a/block.c
+++ b/block.c
@@ -4430,6 +4430,10 @@ void bdrv_start_replication(BlockDriverState *bs, ReplicationMode mode,
 {
     BlockDriver *drv = bs->drv;
 
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+        return;
+    }
+
     if (drv && drv->bdrv_start_replication) {
         drv->bdrv_start_replication(bs, mode, errp);
     } else if (bs->file) {
@@ -4443,6 +4447,10 @@ void bdrv_do_checkpoint(BlockDriverState *bs, Error **errp)
 {
     BlockDriver *drv = bs->drv;
 
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+        return;
+    }
+
     if (drv && drv->bdrv_do_checkpoint) {
         drv->bdrv_do_checkpoint(bs, errp);
     } else if (bs->file) {
@@ -4456,6 +4464,10 @@ void bdrv_stop_replication(BlockDriverState *bs, bool failover, Error **errp)
 {
     BlockDriver *drv = bs->drv;
 
+    if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKING_REFERENCE, NULL)) {
+        return;
+    }
+
     if (drv && drv->bdrv_stop_replication) {
         drv->bdrv_stop_replication(bs, failover, errp);
     } else if (bs->file) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 13/16] quorum: implement block driver interfaces for block replication
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (11 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 12/16] skip nbd_target when starting " Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18 13:06   ` Stefan Hajnoczi
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 14/16] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/quorum.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 77e55b2..01cfac0 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -82,6 +82,8 @@ typedef struct BDRVQuorumState {
                             */
 
     QuorumReadPattern read_pattern;
+
+    int replication_index; /* store which child supports block replication */
 } BDRVQuorumState;
 
 typedef struct QuorumAIOCB QuorumAIOCB;
@@ -945,6 +947,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     g_free(opened);
+    s->replication_index = -1;
     goto exit;
 
 close_exit:
@@ -1032,6 +1035,77 @@ static void quorum_refresh_filename(BlockDriverState *bs)
     bs->full_open_options = opts;
 }
 
+static void quorum_start_replication(BlockDriverState *bs, ReplicationMode mode,
+                                     Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+    int count = 0, i, index;
+    Error *local_err = NULL;
+
+    /*
+     * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
+     * QEMU becoming primary QEMU.
+     */
+    if (mode != REPLICATION_MODE_PRIMARY) {
+        error_setg(errp, "Invalid parameter 'mode'");
+        return;
+    }
+
+    if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
+        error_setg(errp, "Invalid parameter 'read pattern'");
+        return;
+    }
+
+    for (i = 0; i < s->num_children; i++) {
+        bdrv_start_replication(s->bs[i], mode, &local_err);
+        if (local_err) {
+            error_free(local_err);
+            local_err = NULL;
+        } else {
+            count++;
+            index = i;
+        }
+    }
+
+    if (count == 0) {
+        /* No child supports block replication */
+        error_setg(errp, "this feature or command is not currently supported");
+    } else if (count > 1) {
+        for (i = 0; i < s->num_children; i++) {
+            bdrv_stop_replication(s->bs[i], false, NULL);
+        }
+        error_setg(errp, "too many children support block replication");
+    } else {
+        s->replication_index = index;
+    }
+}
+
+static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not started");
+        return;
+    }
+
+    bdrv_do_checkpoint(s->bs[s->replication_index], errp);
+}
+
+static void quorum_stop_replication(BlockDriverState *bs, bool failover,
+                                    Error **errp)
+{
+    BDRVQuorumState *s = bs->opaque;
+
+    if (s->replication_index < 0) {
+        error_setg(errp, "Block replication is not started");
+        return;
+    }
+
+    bdrv_stop_replication(s->bs[s->replication_index], failover, errp);
+    s->replication_index = -1;
+}
+
 static BlockDriver bdrv_quorum = {
     .format_name                        = "quorum",
     .protocol_name                      = "quorum",
@@ -1055,6 +1129,10 @@ static BlockDriver bdrv_quorum = {
 
     .is_filter                          = true,
     .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
+
+    .bdrv_start_replication             = quorum_start_replication,
+    .bdrv_do_checkpoint                 = quorum_do_checkpoint,
+    .bdrv_stop_replication              = quorum_stop_replication,
 };
 
 static void bdrv_quorum_init(void)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 14/16] introduce a new API qemu_opts_absorb_qdict_by_index()
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (12 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 13/16] quorum: implement block driver interfaces for " Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 15/16] quorum: allow ignoring child errors Wen Congyang
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 16/16] Implement new driver for block replication Wen Congyang
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 include/qemu/option.h |  2 ++
 util/qemu-option.c    | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/qemu/option.h b/include/qemu/option.h
index ac0e43b..f868893 100644
--- a/include/qemu/option.h
+++ b/include/qemu/option.h
@@ -126,6 +126,8 @@ QemuOpts *qemu_opts_from_qdict(QemuOptsList *list, const QDict *qdict,
                                Error **errp);
 QDict *qemu_opts_to_qdict(QemuOpts *opts, QDict *qdict);
 void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp);
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+                                     const char *index, Error **errp);
 
 typedef int (*qemu_opts_loopfunc)(void *opaque, QemuOpts *opts, Error **errp);
 int qemu_opts_foreach(QemuOptsList *list, qemu_opts_loopfunc func,
diff --git a/util/qemu-option.c b/util/qemu-option.c
index 840f5f7..0c8e898 100644
--- a/util/qemu-option.c
+++ b/util/qemu-option.c
@@ -1006,6 +1006,50 @@ void qemu_opts_absorb_qdict(QemuOpts *opts, QDict *qdict, Error **errp)
 }
 
 /*
+ * Adds all QDict entries to the QemuOpts that can be added and removes them
+ * from the QDict. The key starts with "%index." in the %qdict. When this
+ * function returns, the QDict contains only those entries that couldn't be
+ * added to the QemuOpts.
+ */
+void qemu_opts_absorb_qdict_by_index(QemuOpts *opts, QDict *qdict,
+                                     const char *index, Error **errp)
+{
+    const QDictEntry *entry, *next;
+    const char *key;
+    int len = strlen(index);
+
+    entry = qdict_first(qdict);
+
+    while (entry != NULL) {
+        Error *local_err = NULL;
+        OptsFromQDictState state = {
+            .errp = &local_err,
+            .opts = opts,
+        };
+
+        next = qdict_next(qdict, entry);
+        if (strncmp(entry->key, index, len) || *(entry->key + len) != '.') {
+            entry = next;
+            continue;
+        }
+
+        key = entry->key + len + 1;
+
+        if (find_desc_by_name(opts->list->desc, key)) {
+            qemu_opts_from_qdict_1(key, entry->value, &state);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                return;
+            } else {
+                qdict_del(qdict, entry->key);
+            }
+        }
+
+        entry = next;
+    }
+}
+
+/*
  * Convert from QemuOpts to QDict.
  * The QDict values are of type QString.
  * TODO We'll want to use types appropriate for opt->desc->type, but
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 15/16] quorum: allow ignoring child errors
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (13 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 14/16] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  2015-06-18 13:06   ` Stefan Hajnoczi
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 16/16] Implement new driver for block replication Wen Congyang
  15 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

If the child is not ready, read/write/getlength/flush will
return -errno. It is not critical error, and can be ignored:
1. read/write:
   Just not report the error event.
2. getlength:
   just ignore it. If all children's getlength return -errno,
   and be ignored, return -EIO.
3. flush:
   Just ignore it. If all children's getlength return -errno,
   and be ignored, return 0.

Usage: children.x.ignore-errors=true

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/quorum.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 77 insertions(+), 7 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 01cfac0..c5dbb69 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -30,6 +30,7 @@
 #define QUORUM_OPT_BLKVERIFY      "blkverify"
 #define QUORUM_OPT_REWRITE        "rewrite-corrupted"
 #define QUORUM_OPT_READ_PATTERN   "read-pattern"
+#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS   "ignore-errors"
 
 /* This union holds a vote hash value */
 typedef union QuorumVoteValue {
@@ -65,6 +66,7 @@ typedef struct QuorumVotes {
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
     BlockDriverState **bs; /* children BlockDriverStates */
+    bool *ignore_errors;   /* ignore children's error? */
     int num_children;      /* children count */
     int threshold;         /* if less than threshold children reads gave the
                             * same result a quorum error occurs.
@@ -99,6 +101,7 @@ typedef struct QuorumChildRequest {
     uint8_t *buf;
     int ret;
     QuorumAIOCB *parent;
+    int index;
 } QuorumChildRequest;
 
 /* Quorum will use the following structure to track progress of each read/write
@@ -211,6 +214,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
         acb->qcrs[i].buf = NULL;
         acb->qcrs[i].ret = 0;
         acb->qcrs[i].parent = acb;
+        acb->qcrs[i].index = i;
     }
 
     return acb;
@@ -304,7 +308,7 @@ static void quorum_aio_cb(void *opaque, int ret)
     acb->count++;
     if (ret == 0) {
         acb->success_count++;
-    } else {
+    } else if (!s->ignore_errors[sacb->index]) {
         quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
     }
     assert(acb->count <= s->num_children);
@@ -719,19 +723,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
 static int64_t quorum_getlength(BlockDriverState *bs)
 {
     BDRVQuorumState *s = bs->opaque;
-    int64_t result;
+    int64_t result = -EIO;
     int i;
 
     /* check that all file have the same length */
-    result = bdrv_getlength(s->bs[0]);
-    if (result < 0) {
-        return result;
-    }
-    for (i = 1; i < s->num_children; i++) {
+    for (i = 0; i < s->num_children; i++) {
         int64_t value = bdrv_getlength(s->bs[i]);
+
         if (value < 0) {
             return value;
         }
+
+        if (value == 0 && s->ignore_errors[i]) {
+            /*
+             * If the child is not ready, it cannot return -errno,
+             * otherwise refresh_total_sectors() will fail when
+             * we open the child.
+             */
+            continue;
+        }
+
+        if (result == -EIO) {
+            result = value;
+            continue;
+        }
+
         if (value != result) {
             return -EIO;
         }
@@ -769,6 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
 
     for (i = 0; i < s->num_children; i++) {
         result = bdrv_co_flush(s->bs[i]);
+        if (result < 0 && s->ignore_errors[i]) {
+            result = 0;
+        }
         result_value.l = result;
         quorum_count_vote(&error_votes, &result_value, i);
     }
@@ -843,6 +862,19 @@ static QemuOptsList quorum_runtime_opts = {
     },
 };
 
+static QemuOptsList quorum_children_common_opts = {
+    .name = "quorum children",
+    .head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
+    .desc = {
+        {
+            .name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+            .type = QEMU_OPT_BOOL,
+            .help = "ignore child I/O error",
+        },
+        { /* end of list */ }
+    },
+};
+
 static int parse_read_pattern(const char *opt)
 {
     int i;
@@ -861,6 +893,37 @@ static int parse_read_pattern(const char *opt)
     return -EINVAL;
 }
 
+static int parse_children_options(BDRVQuorumState *s, QDict *options,
+                                  const char *indexstr, int index,
+                                  Error **errp)
+{
+    QemuOpts *children_opts = NULL;
+    Error *local_err = NULL;
+    int ret = 0;
+    bool value;
+
+    children_opts = qemu_opts_create(&quorum_children_common_opts, NULL, 0,
+                                     &error_abort);
+    qemu_opts_absorb_qdict_by_index(children_opts, options, indexstr,
+                                    &local_err);
+    if (local_err) {
+        ret = -EINVAL;
+        goto out;
+    }
+
+    value = qemu_opt_get_bool(children_opts, QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
+                              false);
+    s->ignore_errors[index] = value;
+
+out:
+    qemu_opts_del(children_opts);
+    /* propagate error */
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
 static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
                        Error **errp)
 {
@@ -931,12 +994,18 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
     /* allocate the children BlockDriverState array */
     s->bs = g_new0(BlockDriverState *, s->num_children);
     opened = g_new0(bool, s->num_children);
+    s->ignore_errors = g_new0(bool, s->num_children);
 
     for (i = 0; i < s->num_children; i++) {
         char indexstr[32];
         ret = snprintf(indexstr, 32, "children.%d", i);
         assert(ret < 32);
 
+        ret = parse_children_options(s, options, indexstr, i, &local_err);
+        if (ret < 0) {
+            goto close_exit;
+        }
+
         ret = bdrv_open_image(&s->bs[i], NULL, options, indexstr, bs,
                               &child_format, false, &local_err);
         if (ret < 0) {
@@ -979,6 +1048,7 @@ static void quorum_close(BlockDriverState *bs)
     }
 
     g_free(s->bs);
+    g_free(s->ignore_errors);
 }
 
 static void quorum_detach_aio_context(BlockDriverState *bs)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [Qemu-devel] [PATCH COLO-Block v6 16/16] Implement new driver for block replication
  2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
                   ` (14 preceding siblings ...)
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 15/16] quorum: allow ignoring child errors Wen Congyang
@ 2015-06-18  8:49 ` Wen Congyang
  15 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18  8:49 UTC (permalink / raw)
  To: qemu devel, Fam Zheng, Max Reitz, Paolo Bonzini
  Cc: Kevin Wolf, qemu block, Lai Jiangshan, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, Gonglei, Stefan Hajnoczi, Yang Hongyang,
	zhanghailiang

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 block/Makefile.objs |   1 +
 block/replication.c | 441 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 442 insertions(+)
 create mode 100644 block/replication.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index f068666..84952b1 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -23,6 +23,7 @@ block-obj-$(CONFIG_LIBSSH2) += ssh.o
 block-obj-y += accounting.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
+block-obj-y += replication.o
 
 common-obj-y += stream.o
 common-obj-y += commit.o
diff --git a/block/replication.c b/block/replication.c
new file mode 100644
index 0000000..da3b512
--- /dev/null
+++ b/block/replication.c
@@ -0,0 +1,441 @@
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "block/nbd.h"
+
+typedef struct BDRVReplicationState {
+    ReplicationMode mode;
+    int replication_state;
+    char *export_name;
+    NBDExport *exp;
+    BlockDriverState *active_disk;
+    BlockDriverState *hidden_disk;
+    BlockDriverState *secondary_disk; /* nbd target */
+    int error;
+} BDRVReplicationState;
+
+enum {
+    BLOCK_REPLICATION_NONE,     /* block replication is not started */
+    BLOCK_REPLICATION_RUNNING,  /* block replication is running */
+    BLOCK_REPLICATION_DONE,     /* block replication is done(failover) */
+};
+
+#define COMMIT_CLUSTER_BITS 16
+#define COMMIT_CLUSTER_SIZE (1 << COMMIT_CLUSTER_BITS)
+#define COMMIT_SECTORS_PER_CLUSTER (COMMIT_CLUSTER_SIZE / BDRV_SECTOR_SIZE)
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp);
+
+#define NBD_OPT_EXPORT          "export"
+#define REPLICATION_MODE        "mode"
+static QemuOptsList replication_runtime_opts = {
+    .name = "replication",
+    .head = QTAILQ_HEAD_INITIALIZER(replication_runtime_opts.head),
+    .desc = {
+        {
+            .name = REPLICATION_MODE,
+            .type = QEMU_OPT_STRING,
+        },
+        {
+            .name = NBD_OPT_EXPORT,
+            .type = QEMU_OPT_STRING,
+            .help = "The NBD server name",
+        },
+        { /* end of list */ }
+    },
+};
+
+static int replication_open(BlockDriverState *bs, QDict *options,
+                            int flags, Error **errp)
+{
+    int ret;
+    BDRVReplicationState *s = bs->opaque;;
+    Error *local_err = NULL;
+    QemuOpts *opts = NULL;
+    const char *mode;
+
+    ret = -EINVAL;
+    opts = qemu_opts_create(&replication_runtime_opts, NULL, 0, &error_abort);
+    qemu_opts_absorb_qdict(opts, options, &local_err);
+    if (local_err) {
+        goto fail;
+    }
+
+    mode = qemu_opt_get(opts, REPLICATION_MODE);
+    if (!mode) {
+        error_setg(&local_err, "Missing the option mode");
+        goto fail;
+    }
+
+    if (!strcmp(mode, "primary")) {
+        s->mode = REPLICATION_MODE_PRIMARY;
+    } else if (!strcmp(mode, "secondary")) {
+        s->mode = REPLICATION_MODE_SECONDARY;
+    } else {
+        error_setg(&local_err,
+                   "The option mode's value should be primary or secondary");
+        goto fail;
+    }
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        s->export_name = g_strdup(qemu_opt_get(opts, NBD_OPT_EXPORT));
+        if (!s->export_name) {
+            error_setg(&local_err, "Missing the option export");
+            goto fail;
+        }
+    }
+
+    return 0;
+
+fail:
+    qemu_opts_del(opts);
+    /* propagate error */
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+    return ret;
+}
+
+static void replication_close(BlockDriverState *bs)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state == BLOCK_REPLICATION_RUNNING) {
+        replication_stop(bs, false, NULL);
+    }
+
+    g_free(s->export_name);
+}
+
+static int64_t replication_getlength(BlockDriverState *bs)
+{
+    return bdrv_getlength(bs->file);
+}
+
+static int replication_get_io_status(BDRVReplicationState *s)
+{
+    switch (s->replication_state) {
+    case BLOCK_REPLICATION_NONE:
+        return -EIO;
+    case BLOCK_REPLICATION_RUNNING:
+        return 0;
+    case BLOCK_REPLICATION_DONE:
+        return s->mode == REPLICATION_MODE_PRIMARY ? -EIO : 1;
+    default:
+        abort();
+    }
+}
+
+static int replication_return_value(BDRVReplicationState *s, int ret)
+{
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        return ret;
+    }
+
+    if (ret < 0) {
+        s->error = ret;
+        ret = 0;
+    }
+
+    return ret;
+}
+
+static coroutine_fn int replication_co_readv(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int remaining_sectors,
+                                             QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    if (s->mode == REPLICATION_MODE_PRIMARY) {
+        /* We only use it to forward primary write requests */
+        return -EIO;
+    }
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /*
+     * After failover, because we don't commit active disk/hidden disk
+     * to secondary disk(nbd target), so we should read from active disk
+     * directly.
+     */
+    ret = bdrv_co_readv(bs->file, sector_num, remaining_sectors, qiov);
+    return replication_return_value(s, ret);
+}
+
+static coroutine_fn int replication_co_writev(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int remaining_sectors,
+                                              QEMUIOVector *qiov)
+{
+    BDRVReplicationState *s = bs->opaque;
+    QEMUIOVector hd_qiov;
+    uint64_t bytes_done = 0;
+    BlockDriverState *top = bs->file;
+    BlockDriverState *base = s->secondary_disk;
+    BlockDriverState *target;
+    int ret, n;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 0) {
+        ret = bdrv_co_writev(bs->file, sector_num, remaining_sectors, qiov);
+        return replication_return_value(s, ret);
+    }
+
+    /*
+     * Only write to active disk if the sectors have
+     * already been allocated in active disk/hidden disk.
+     */
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    while (remaining_sectors > 0) {
+        ret = bdrv_is_allocated_above(top, base, sector_num,
+                                      remaining_sectors, &n);
+        if (ret < 0) {
+            return ret;
+        }
+
+        qemu_iovec_reset(&hd_qiov);
+        qemu_iovec_concat(&hd_qiov, qiov, bytes_done, n * 512);
+
+        target = ret ? top: base;
+        ret = bdrv_co_writev(target, sector_num, n, &hd_qiov);
+        if (ret < 0) {
+            return ret;
+        }
+
+        remaining_sectors -= n;
+        sector_num += n;
+        bytes_done += n * BDRV_SECTOR_SIZE;
+    }
+
+    return 0;
+}
+
+static coroutine_fn int replication_co_discard(BlockDriverState *bs,
+                                               int64_t sector_num,
+                                               int nb_sectors)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int ret;
+
+    ret = replication_get_io_status(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret == 1) {
+        /* It is secondary qemu and we are after failover */
+        ret = bdrv_co_discard(s->secondary_disk, sector_num, nb_sectors);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    ret = bdrv_co_discard(bs->file, sector_num, nb_sectors);
+    return replication_return_value(s, ret);
+}
+
+static bool replication_recurse_is_first_non_filter(BlockDriverState *bs,
+                                                    BlockDriverState *candidate)
+{
+    return bdrv_recurse_is_first_non_filter(bs->file, candidate);
+}
+
+static void secondary_do_checkpoint(BDRVReplicationState *s, Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    if (!s->secondary_disk->job) {
+        error_setg(errp, "Backup job is cancelled unexpectedly");
+        return;
+    }
+
+    block_job_do_checkpoint(s->secondary_disk->job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    ret = s->active_disk->drv->bdrv_make_empty(s->active_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make active disk empty");
+        return;
+    }
+
+    ret = s->hidden_disk->drv->bdrv_make_empty(s->hidden_disk);
+    if (ret < 0) {
+        error_setg(errp, "Cannot make hidden disk empty");
+        return;
+    }
+}
+
+static void replication_start(BlockDriverState *bs, ReplicationMode mode,
+                              Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+    int64_t active_length, hidden_length, nbd_length;
+    Error *local_err = NULL;
+
+    if (s->replication_state != BLOCK_REPLICATION_NONE) {
+        error_setg(errp, "Block replication is running or done");
+        return;
+    }
+
+    if (s->mode != mode) {
+        error_setg(errp, "Invalid parameter 'mode'");
+        return;
+    }
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        bdrv_connect(bs->file, errp);
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        if (!bs->file->backing_reference) {
+            error_setg(errp, "Active disk doesn't use backing_reference");
+        }
+
+        s->active_disk = bs->file;
+        s->hidden_disk = s->active_disk->backing_hd;
+        s->secondary_disk = s->hidden_disk->backing_hd;
+
+        if (!s->secondary_disk->job ||
+            s->secondary_disk->job->driver->job_type != BLOCK_JOB_TYPE_BACKUP) {
+            error_setg(errp, "Backup job is cancelled unexpectedly");
+            return;
+        }
+
+        /* verify the length */
+        active_length = bdrv_getlength(s->active_disk);
+        hidden_length = bdrv_getlength(s->hidden_disk);
+        nbd_length = bdrv_getlength(s->secondary_disk);
+        if (active_length < 0 || hidden_length < 0 || nbd_length < 0 ||
+            active_length != hidden_length || hidden_length != nbd_length) {
+            error_setg(errp, "active disk, hidden disk, nbd target's length"
+                       " are not the same");
+            return;
+        }
+
+        if (!s->active_disk->drv->bdrv_make_empty ||
+            !s->hidden_disk->drv->bdrv_make_empty) {
+            error_setg(errp,
+                       "active disk or hidden disk doesn't support make_empty");
+            return;
+        }
+        break;
+    default:
+        abort();
+    }
+
+    s->replication_state = BLOCK_REPLICATION_RUNNING;
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+
+        /* start NBD server */
+        s->exp = nbd_export_new(s->secondary_disk->blk, 0, -1, 0, NULL,
+                                &local_err);
+        if (!s->exp) {
+            s->replication_state = BLOCK_REPLICATION_NONE;
+            error_propagate(errp, local_err);
+            return;
+        }
+        nbd_export_set_name(s->exp, s->export_name);
+    }
+}
+
+static void replication_do_checkpoint(BlockDriverState *bs, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    if (s->error) {
+        error_setg(errp, "I/O error occurs");
+        return;
+    }
+
+    if (s->mode == REPLICATION_MODE_SECONDARY) {
+        secondary_do_checkpoint(s, errp);
+    }
+}
+
+static void replication_stop(BlockDriverState *bs, bool failover, Error **errp)
+{
+    BDRVReplicationState *s = bs->opaque;
+
+    if (s->replication_state != BLOCK_REPLICATION_RUNNING) {
+        error_setg(errp, "Block replication is not running");
+        return;
+    }
+
+    s->replication_state = BLOCK_REPLICATION_DONE;
+
+    switch (s->mode) {
+    case REPLICATION_MODE_PRIMARY:
+        bdrv_disconnect(bs->file);
+        break;
+    case REPLICATION_MODE_SECONDARY:
+        /* stop NBD server */
+        nbd_export_close(s->exp);
+        nbd_export_put(s->exp);
+
+        if (!failover) {
+            secondary_do_checkpoint(s, errp);
+            return;
+        }
+
+        if (!s->secondary_disk->job ||
+            s->secondary_disk->job->driver->job_type != BLOCK_JOB_TYPE_BACKUP) {
+            error_setg(errp, "Backup job is cancelled unexpectedly");
+            return;
+        }
+
+        block_job_cancel(s->secondary_disk->job);
+        break;
+    default:
+        abort();
+    }
+}
+
+BlockDriver bdrv_replication = {
+    .format_name                = "replication",
+    .protocol_name              = "replication",
+    .instance_size              = sizeof(BDRVReplicationState),
+
+    .bdrv_open                  = replication_open,
+    .bdrv_close                 = replication_close,
+
+    .bdrv_getlength             = replication_getlength,
+    .bdrv_co_readv              = replication_co_readv,
+    .bdrv_co_writev             = replication_co_writev,
+    .bdrv_co_discard            = replication_co_discard,
+
+    .is_filter                  = true,
+    .bdrv_recurse_is_first_non_filter = replication_recurse_is_first_non_filter,
+
+    .bdrv_start_replication     = replication_start,
+    .bdrv_do_checkpoint         = replication_do_checkpoint,
+    .bdrv_stop_replication      = replication_stop,
+
+    .has_variable_length        = true,
+};
+
+static void bdrv_replication_init(void)
+{
+    bdrv_register(&bdrv_replication);
+}
+
+block_init(bdrv_replication_init);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description Wen Congyang
@ 2015-06-18 10:34   ` Stefan Hajnoczi
  2015-06-18 10:51     ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 10:34 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

On Thu, Jun 18, 2015 at 04:49:06PM +0800, Wen Congyang wrote:
> +== Failure Handling ==
> +There are 6 internal errors when block replication is running:
> +1. I/O error on primary disk
> +2. Forwarding primay write requests failed

s/primay/primary/

> +3. Bacup failed or writing to secondary disk failed

s/Bacup/Backup/

Is "writing to secondary disk failed" a duplicate of #4?

> +4. I/O error on secondary disk

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 03/16] Allow creating backup jobs when opening BDS
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 03/16] Allow creating backup jobs when opening BDS Wen Congyang
@ 2015-06-18 10:40   ` Stefan Hajnoczi
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 10:40 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

On Thu, Jun 18, 2015 at 04:49:08PM +0800, Wen Congyang wrote:
> When opening BDS, we need to create backup jobs for
> image-fleecing.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Cc: Jeff Cody <jcody@redhat.com>
> ---
>  block/Makefile.objs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

mirror.o is already in block-obj so moving backup.o from common-obj
should be okay.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
@ 2015-06-18 10:50   ` Stefan Hajnoczi
  2015-06-18 11:40     ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 10:50 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 1619 bytes --]

On Thu, Jun 18, 2015 at 04:49:09PM +0800, Wen Congyang wrote:
> Usage:
> -drive file=xxx,id=Y, \
> -drive file=xxxx,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*
> 
> It will create such backing chain:
>                {virtio-blk dev 'Y'}                                      {virtio-blk dev 'X'}
>                          |                                                          |
>                          |                                                          |
>                          v                                                          v
> 
>     [base] <- [mid] <- ( Y )  <----------------- (hidden target) <--------------- ( X )
> 
>                          v                              ^
>                          v                              ^
>                          v                              ^
>                          v                              ^
>                          >>>> drive-backup sync=none >>>>
> 
> X's backing file is hidden-disk, and hidden-disk's backing file is Y.
> Disk Y may be opened or reopened in read-write mode, so A block backup
> job is automatically created: source is Y and target is hidden disk.
> Active disk X, hidden disk, and Y are all on the same AioContext.

The command-line option name confused me.  I expected "backing_reference" to result in:

  Y <- X

The backup block job and hidden target makes this option more than just
referencing the backing device.

The name "backing.backup-reference" is clearer to me, but I think Kevin
might have more idea for you on this patch.

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description
  2015-06-18 10:34   ` Stefan Hajnoczi
@ 2015-06-18 10:51     ` Wen Congyang
  0 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18 10:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

On 06/18/2015 06:34 PM, Stefan Hajnoczi wrote:
> On Thu, Jun 18, 2015 at 04:49:06PM +0800, Wen Congyang wrote:
>> +== Failure Handling ==
>> +There are 6 internal errors when block replication is running:
>> +1. I/O error on primary disk
>> +2. Forwarding primay write requests failed
> 
> s/primay/primary/
> 
>> +3. Bacup failed or writing to secondary disk failed
> 
> s/Bacup/Backup/
> 
> Is "writing to secondary disk failed" a duplicate of #4?

Yes, will fix it in the next version.

Thanks
Wen Congyang

> 
>> +4. I/O error on secondary disk

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS
  2015-06-18 10:50   ` Stefan Hajnoczi
@ 2015-06-18 11:40     ` Wen Congyang
  0 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18 11:40 UTC (permalink / raw)
  To: Stefan Hajnoczi, Fam Zheng
  Cc: Kevin Wolf, Lai Jiangshan, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang, Dr. David Alan Gilbert, zhanghailiang

On 06/18/2015 06:50 PM, Stefan Hajnoczi wrote:
> On Thu, Jun 18, 2015 at 04:49:09PM +0800, Wen Congyang wrote:
>> Usage:
>> -drive file=xxx,id=Y, \
>> -drive file=xxxx,id=X,backing_reference.drive_id=Y,backing_reference.hidden-disk.*
>>
>> It will create such backing chain:
>>                {virtio-blk dev 'Y'}                                      {virtio-blk dev 'X'}
>>                          |                                                          |
>>                          |                                                          |
>>                          v                                                          v
>>
>>     [base] <- [mid] <- ( Y )  <----------------- (hidden target) <--------------- ( X )
>>
>>                          v                              ^
>>                          v                              ^
>>                          v                              ^
>>                          v                              ^
>>                          >>>> drive-backup sync=none >>>>
>>
>> X's backing file is hidden-disk, and hidden-disk's backing file is Y.
>> Disk Y may be opened or reopened in read-write mode, so A block backup
>> job is automatically created: source is Y and target is hidden disk.
>> Active disk X, hidden disk, and Y are all on the same AioContext.
> 
> The command-line option name confused me.  I expected "backing_reference" to result in:
> 
>   Y <- X

So if I want to create
Y <- hidden-disk <- X
The command line can be
backing.file.filename=xxx,backing.driver=xxx,backing.backing.backing_reference=Y

> 
> The backup block job and hidden target makes this option more than just
> referencing the backing device.
> 
> The name "backing.backup-reference" is clearer to me, but I think Kevin
> might have more idea for you on this patch.
> 

Any suggestion is welcome.

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target Wen Congyang
@ 2015-06-18 12:47   ` Stefan Hajnoczi
  2015-06-18 15:17     ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 12:47 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 1252 bytes --]

On Thu, Jun 18, 2015 at 04:49:11PM +0800, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/block.c b/block.c
> index d1ed227..0b41af4 100644
> --- a/block.c
> +++ b/block.c
> @@ -1294,6 +1294,14 @@ static int bdrv_open_backing_reference_file(BlockDriverState *bs,
>      }
>  
>      backing_hd = blk_bs(backing_blk);
> +    /* Don't allow a disk use backing reference target */
> +    ret = blk_attach_dev(backing_hd->blk, bs);
> +    if (ret < 0) {
> +        error_setg(errp, "backing_hd %s is used by the other device model",
> +                   backing_name);
> +        goto free_exit;
> +    }

Can you explain the purpose of this patch?

I'm not sure blk_attach_dev() is the appropriate API.  It is only used
by emulated devices but not by the run-time NBD server, for example.
This means you are not preventing all other users from accessing this
BDS.

The op blockers mechanism is normally used to prevent operations on a
BDS.  See bdrv_op_is_blocked() and bdrv_op_block().

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target Wen Congyang
@ 2015-06-18 12:55   ` Stefan Hajnoczi
  2015-06-18 14:36     ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 12:55 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
> +void bdrv_connect(BlockDriverState *bs, Error **errp)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_connect) {
> +        drv->bdrv_connect(bs, errp);
> +    } else if (bs->file) {
> +        bdrv_connect(bs->file, errp);
> +    } else {
> +        error_setg(errp, "this feature or command is not currently supported");
> +    }
> +}
> +
> +void bdrv_disconnect(BlockDriverState *bs)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    if (drv && drv->bdrv_disconnect) {
> +        drv->bdrv_disconnect(bs);
> +    } else if (bs->file) {
> +        bdrv_disconnect(bs->file);
> +    }
> +}

Please add doc comments describing the semantics of these commands.

Why are these operations needed when there is already a bs->drv == NULL
case which means the BDS is not ready for read/write?

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 13/16] quorum: implement block driver interfaces for block replication
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 13/16] quorum: implement block driver interfaces for " Wen Congyang
@ 2015-06-18 13:06   ` Stefan Hajnoczi
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 13:06 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Alberto Garcia, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 4104 bytes --]

On Thu, Jun 18, 2015 at 04:49:18PM +0800, Wen Congyang wrote:

CCing Alberto Garcia for the quorum block driver.

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block/quorum.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 78 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 77e55b2..01cfac0 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -82,6 +82,8 @@ typedef struct BDRVQuorumState {
>                              */
>  
>      QuorumReadPattern read_pattern;
> +
> +    int replication_index; /* store which child supports block replication */
>  } BDRVQuorumState;
>  
>  typedef struct QuorumAIOCB QuorumAIOCB;
> @@ -945,6 +947,7 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      g_free(opened);
> +    s->replication_index = -1;
>      goto exit;
>  
>  close_exit:
> @@ -1032,6 +1035,77 @@ static void quorum_refresh_filename(BlockDriverState *bs)
>      bs->full_open_options = opts;
>  }
>  
> +static void quorum_start_replication(BlockDriverState *bs, ReplicationMode mode,
> +                                     Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +    int count = 0, i, index;
> +    Error *local_err = NULL;
> +
> +    /*
> +     * TODO: support REPLICATION_MODE_SECONDARY if we allow secondary
> +     * QEMU becoming primary QEMU.
> +     */
> +    if (mode != REPLICATION_MODE_PRIMARY) {
> +        error_setg(errp, "Invalid parameter 'mode'");
> +        return;
> +    }
> +
> +    if (s->read_pattern != QUORUM_READ_PATTERN_FIFO) {
> +        error_setg(errp, "Invalid parameter 'read pattern'");
> +        return;
> +    }
> +
> +    for (i = 0; i < s->num_children; i++) {
> +        bdrv_start_replication(s->bs[i], mode, &local_err);
> +        if (local_err) {
> +            error_free(local_err);
> +            local_err = NULL;
> +        } else {
> +            count++;
> +            index = i;
> +        }
> +    }
> +
> +    if (count == 0) {
> +        /* No child supports block replication */
> +        error_setg(errp, "this feature or command is not currently supported");
> +    } else if (count > 1) {
> +        for (i = 0; i < s->num_children; i++) {
> +            bdrv_stop_replication(s->bs[i], false, NULL);
> +        }
> +        error_setg(errp, "too many children support block replication");
> +    } else {
> +        s->replication_index = index;
> +    }
> +}
> +
> +static void quorum_do_checkpoint(BlockDriverState *bs, Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +
> +    if (s->replication_index < 0) {
> +        error_setg(errp, "Block replication is not started");
> +        return;
> +    }
> +
> +    bdrv_do_checkpoint(s->bs[s->replication_index], errp);
> +}
> +
> +static void quorum_stop_replication(BlockDriverState *bs, bool failover,
> +                                    Error **errp)
> +{
> +    BDRVQuorumState *s = bs->opaque;
> +
> +    if (s->replication_index < 0) {
> +        error_setg(errp, "Block replication is not started");
> +        return;
> +    }
> +
> +    bdrv_stop_replication(s->bs[s->replication_index], failover, errp);
> +    s->replication_index = -1;
> +}
> +
>  static BlockDriver bdrv_quorum = {
>      .format_name                        = "quorum",
>      .protocol_name                      = "quorum",
> @@ -1055,6 +1129,10 @@ static BlockDriver bdrv_quorum = {
>  
>      .is_filter                          = true,
>      .bdrv_recurse_is_first_non_filter   = quorum_recurse_is_first_non_filter,
> +
> +    .bdrv_start_replication             = quorum_start_replication,
> +    .bdrv_do_checkpoint                 = quorum_do_checkpoint,
> +    .bdrv_stop_replication              = quorum_stop_replication,
>  };
>  
>  static void bdrv_quorum_init(void)
> -- 
> 2.4.3
> 
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 15/16] quorum: allow ignoring child errors
  2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 15/16] quorum: allow ignoring child errors Wen Congyang
@ 2015-06-18 13:06   ` Stefan Hajnoczi
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 13:06 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Alberto Garcia, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 7177 bytes --]

On Thu, Jun 18, 2015 at 04:49:20PM +0800, Wen Congyang wrote:

CCing Alberto Garcia for the quorum block driver.

> If the child is not ready, read/write/getlength/flush will
> return -errno. It is not critical error, and can be ignored:
> 1. read/write:
>    Just not report the error event.
> 2. getlength:
>    just ignore it. If all children's getlength return -errno,
>    and be ignored, return -EIO.
> 3. flush:
>    Just ignore it. If all children's getlength return -errno,
>    and be ignored, return 0.
> 
> Usage: children.x.ignore-errors=true
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  block/quorum.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 77 insertions(+), 7 deletions(-)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 01cfac0..c5dbb69 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -30,6 +30,7 @@
>  #define QUORUM_OPT_BLKVERIFY      "blkverify"
>  #define QUORUM_OPT_REWRITE        "rewrite-corrupted"
>  #define QUORUM_OPT_READ_PATTERN   "read-pattern"
> +#define QUORUM_CHILDREN_OPT_IGNORE_ERRORS   "ignore-errors"
>  
>  /* This union holds a vote hash value */
>  typedef union QuorumVoteValue {
> @@ -65,6 +66,7 @@ typedef struct QuorumVotes {
>  /* the following structure holds the state of one quorum instance */
>  typedef struct BDRVQuorumState {
>      BlockDriverState **bs; /* children BlockDriverStates */
> +    bool *ignore_errors;   /* ignore children's error? */
>      int num_children;      /* children count */
>      int threshold;         /* if less than threshold children reads gave the
>                              * same result a quorum error occurs.
> @@ -99,6 +101,7 @@ typedef struct QuorumChildRequest {
>      uint8_t *buf;
>      int ret;
>      QuorumAIOCB *parent;
> +    int index;
>  } QuorumChildRequest;
>  
>  /* Quorum will use the following structure to track progress of each read/write
> @@ -211,6 +214,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
>          acb->qcrs[i].buf = NULL;
>          acb->qcrs[i].ret = 0;
>          acb->qcrs[i].parent = acb;
> +        acb->qcrs[i].index = i;
>      }
>  
>      return acb;
> @@ -304,7 +308,7 @@ static void quorum_aio_cb(void *opaque, int ret)
>      acb->count++;
>      if (ret == 0) {
>          acb->success_count++;
> -    } else {
> +    } else if (!s->ignore_errors[sacb->index]) {
>          quorum_report_bad(acb, sacb->aiocb->bs->node_name, ret);
>      }
>      assert(acb->count <= s->num_children);
> @@ -719,19 +723,31 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
>  static int64_t quorum_getlength(BlockDriverState *bs)
>  {
>      BDRVQuorumState *s = bs->opaque;
> -    int64_t result;
> +    int64_t result = -EIO;
>      int i;
>  
>      /* check that all file have the same length */
> -    result = bdrv_getlength(s->bs[0]);
> -    if (result < 0) {
> -        return result;
> -    }
> -    for (i = 1; i < s->num_children; i++) {
> +    for (i = 0; i < s->num_children; i++) {
>          int64_t value = bdrv_getlength(s->bs[i]);
> +
>          if (value < 0) {
>              return value;
>          }
> +
> +        if (value == 0 && s->ignore_errors[i]) {
> +            /*
> +             * If the child is not ready, it cannot return -errno,
> +             * otherwise refresh_total_sectors() will fail when
> +             * we open the child.
> +             */
> +            continue;
> +        }
> +
> +        if (result == -EIO) {
> +            result = value;
> +            continue;
> +        }
> +
>          if (value != result) {
>              return -EIO;
>          }
> @@ -769,6 +785,9 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
>  
>      for (i = 0; i < s->num_children; i++) {
>          result = bdrv_co_flush(s->bs[i]);
> +        if (result < 0 && s->ignore_errors[i]) {
> +            result = 0;
> +        }
>          result_value.l = result;
>          quorum_count_vote(&error_votes, &result_value, i);
>      }
> @@ -843,6 +862,19 @@ static QemuOptsList quorum_runtime_opts = {
>      },
>  };
>  
> +static QemuOptsList quorum_children_common_opts = {
> +    .name = "quorum children",
> +    .head = QTAILQ_HEAD_INITIALIZER(quorum_children_common_opts.head),
> +    .desc = {
> +        {
> +            .name = QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
> +            .type = QEMU_OPT_BOOL,
> +            .help = "ignore child I/O error",
> +        },
> +        { /* end of list */ }
> +    },
> +};
> +
>  static int parse_read_pattern(const char *opt)
>  {
>      int i;
> @@ -861,6 +893,37 @@ static int parse_read_pattern(const char *opt)
>      return -EINVAL;
>  }
>  
> +static int parse_children_options(BDRVQuorumState *s, QDict *options,
> +                                  const char *indexstr, int index,
> +                                  Error **errp)
> +{
> +    QemuOpts *children_opts = NULL;
> +    Error *local_err = NULL;
> +    int ret = 0;
> +    bool value;
> +
> +    children_opts = qemu_opts_create(&quorum_children_common_opts, NULL, 0,
> +                                     &error_abort);
> +    qemu_opts_absorb_qdict_by_index(children_opts, options, indexstr,
> +                                    &local_err);
> +    if (local_err) {
> +        ret = -EINVAL;
> +        goto out;
> +    }
> +
> +    value = qemu_opt_get_bool(children_opts, QUORUM_CHILDREN_OPT_IGNORE_ERRORS,
> +                              false);
> +    s->ignore_errors[index] = value;
> +
> +out:
> +    qemu_opts_del(children_opts);
> +    /* propagate error */
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }
> +    return ret;
> +}
> +
>  static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>                         Error **errp)
>  {
> @@ -931,12 +994,18 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
>      /* allocate the children BlockDriverState array */
>      s->bs = g_new0(BlockDriverState *, s->num_children);
>      opened = g_new0(bool, s->num_children);
> +    s->ignore_errors = g_new0(bool, s->num_children);
>  
>      for (i = 0; i < s->num_children; i++) {
>          char indexstr[32];
>          ret = snprintf(indexstr, 32, "children.%d", i);
>          assert(ret < 32);
>  
> +        ret = parse_children_options(s, options, indexstr, i, &local_err);
> +        if (ret < 0) {
> +            goto close_exit;
> +        }
> +
>          ret = bdrv_open_image(&s->bs[i], NULL, options, indexstr, bs,
>                                &child_format, false, &local_err);
>          if (ret < 0) {
> @@ -979,6 +1048,7 @@ static void quorum_close(BlockDriverState *bs)
>      }
>  
>      g_free(s->bs);
> +    g_free(s->ignore_errors);
>  }
>  
>  static void quorum_detach_aio_context(BlockDriverState *bs)
> -- 
> 2.4.3
> 
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-18 12:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2015-06-18 14:36     ` Wen Congyang
  2015-06-18 16:06       ` Stefan Hajnoczi
  0 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-18 14:36 UTC (permalink / raw)
  To: Stefan Hajnoczi, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +
>> +    if (drv && drv->bdrv_connect) {
>> +        drv->bdrv_connect(bs, errp);
>> +    } else if (bs->file) {
>> +        bdrv_connect(bs->file, errp);
>> +    } else {
>> +        error_setg(errp, "this feature or command is not currently supported");
>> +    }
>> +}
>> +
>> +void bdrv_disconnect(BlockDriverState *bs)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +
>> +    if (drv && drv->bdrv_disconnect) {
>> +        drv->bdrv_disconnect(bs);
>> +    } else if (bs->file) {
>> +        bdrv_disconnect(bs->file);
>> +    }
>> +}
>
> Please add doc comments describing the semantics of these commands.

Where should it be documented? In the header file?

>
> Why are these operations needed when there is already a bs->drv == NULL
> case which means the BDS is not ready for read/write?
>

The purpos is that: don't connect to nbd server when opening a nbd 
client. connect/disconnect
to nbd server when we need to do it.

IIUC, if bs->drv is NULL, it means that the driver is ejected? Here, 
connect/disconnect
means that connect/disconnect to remote target(The target may be in 
another host).

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target
  2015-06-18 12:47   ` Stefan Hajnoczi
@ 2015-06-18 15:17     ` Wen Congyang
  0 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-18 15:17 UTC (permalink / raw)
  To: Stefan Hajnoczi, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block, Jiang Yunhong,
	Dong Eddie, qemu devel, Max Reitz, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	zhanghailiang

At 2015/6/18 20:47, Stefan Hajnoczi Wrote:
> On Thu, Jun 18, 2015 at 04:49:11PM +0800, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> ---
>>   block.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index d1ed227..0b41af4 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1294,6 +1294,14 @@ static int bdrv_open_backing_reference_file(BlockDriverState *bs,
>>       }
>>
>>       backing_hd = blk_bs(backing_blk);
>> +    /* Don't allow a disk use backing reference target */
>> +    ret = blk_attach_dev(backing_hd->blk, bs);
>> +    if (ret < 0) {
>> +        error_setg(errp, "backing_hd %s is used by the other device model",
>> +                   backing_name);
>> +        goto free_exit;
>> +    }
>
> Can you explain the purpose of this patch?
>
> I'm not sure blk_attach_dev() is the appropriate API.  It is only used
> by emulated devices but not by the run-time NBD server, for example.
> This means you are not preventing all other users from accessing this
> BDS.
>
> The op blockers mechanism is normally used to prevent operations on a
> BDS.  See bdrv_op_is_blocked() and bdrv_op_block().

NBD server will write to this BDS(backing reference target). So I cannot use
bdrv_op_block(). If some emulated devices also write to it, the data will
be broken. The purpose is that: this BDS cannot be attached, and any 
emulated
devices will meet some erros if it tries to attach to this BDS.

Thanks
Wen Congyang

>
> Stefan
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-18 14:36     ` Wen Congyang
@ 2015-06-18 16:06       ` Stefan Hajnoczi
  2015-06-19  0:54         ` Wen Congyang
  2015-06-23  6:44         ` Wen Congyang
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-18 16:06 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]

On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
> >On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
> >>+void bdrv_connect(BlockDriverState *bs, Error **errp)
> >>+{
> >>+    BlockDriver *drv = bs->drv;
> >>+
> >>+    if (drv && drv->bdrv_connect) {
> >>+        drv->bdrv_connect(bs, errp);
> >>+    } else if (bs->file) {
> >>+        bdrv_connect(bs->file, errp);
> >>+    } else {
> >>+        error_setg(errp, "this feature or command is not currently supported");
> >>+    }
> >>+}
> >>+
> >>+void bdrv_disconnect(BlockDriverState *bs)
> >>+{
> >>+    BlockDriver *drv = bs->drv;
> >>+
> >>+    if (drv && drv->bdrv_disconnect) {
> >>+        drv->bdrv_disconnect(bs);
> >>+    } else if (bs->file) {
> >>+        bdrv_disconnect(bs->file);
> >>+    }
> >>+}
> >
> >Please add doc comments describing the semantics of these commands.
> 
> Where should it be documented? In the header file?

block.h doesn't document prototypes in the header file, please document
the function definition in block.c.  (QEMU is not consistent here, some
places do it the other way around.)

> >Why are these operations needed when there is already a bs->drv == NULL
> >case which means the BDS is not ready for read/write?
> >
> 
> The purpos is that: don't connect to nbd server when opening a nbd client.
> connect/disconnect
> to nbd server when we need to do it.
> 
> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
> connect/disconnect
> means that connect/disconnect to remote target(The target may be in another
> host).

Connect/disconnect puts something on the QEMU command-line that isn't
ready at startup time.

How about using monitor commands to add objects when needed instead?

That is cleaner because it doesn't introduce a new state (which is only
implemented for nbd).

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-18 16:06       ` Stefan Hajnoczi
@ 2015-06-19  0:54         ` Wen Congyang
  2015-06-19 10:49           ` Stefan Hajnoczi
  2015-06-23  6:44         ` Wen Congyang
  1 sibling, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-19  0:54 UTC (permalink / raw)
  To: Stefan Hajnoczi, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>> +{
>>>> +    BlockDriver *drv = bs->drv;
>>>> +
>>>> +    if (drv && drv->bdrv_connect) {
>>>> +        drv->bdrv_connect(bs, errp);
>>>> +    } else if (bs->file) {
>>>> +        bdrv_connect(bs->file, errp);
>>>> +    } else {
>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>> +    }
>>>> +}
>>>> +
>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>> +{
>>>> +    BlockDriver *drv = bs->drv;
>>>> +
>>>> +    if (drv && drv->bdrv_disconnect) {
>>>> +        drv->bdrv_disconnect(bs);
>>>> +    } else if (bs->file) {
>>>> +        bdrv_disconnect(bs->file);
>>>> +    }
>>>> +}
>>>
>>> Please add doc comments describing the semantics of these commands.
>>
>> Where should it be documented? In the header file?
> 
> block.h doesn't document prototypes in the header file, please document
> the function definition in block.c.  (QEMU is not consistent here, some
> places do it the other way around.)
> 
>>> Why are these operations needed when there is already a bs->drv == NULL
>>> case which means the BDS is not ready for read/write?
>>>
>>
>> The purpos is that: don't connect to nbd server when opening a nbd client.
>> connect/disconnect
>> to nbd server when we need to do it.
>>
>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>> connect/disconnect
>> means that connect/disconnect to remote target(The target may be in another
>> host).
> 
> Connect/disconnect puts something on the QEMU command-line that isn't
> ready at startup time.
> 
> How about using monitor commands to add objects when needed instead?
> 
> That is cleaner because it doesn't introduce a new state (which is only
> implemented for nbd).
> 

The problem is that, nbd client is one child of quorum, and quorum must have more
than one child. The nbd server is not ready until colo is running.

Any suggestion is welcome.

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-19  0:54         ` Wen Congyang
@ 2015-06-19 10:49           ` Stefan Hajnoczi
  2015-06-20  3:31             ` Wen Congyang
  2015-06-24  1:11             ` Wen Congyang
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-19 10:49 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, zhanghailiang,
	Gonglei, Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 2545 bytes --]

On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> > On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
> >> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
> >>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
> >>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
> >>>> +{
> >>>> +    BlockDriver *drv = bs->drv;
> >>>> +
> >>>> +    if (drv && drv->bdrv_connect) {
> >>>> +        drv->bdrv_connect(bs, errp);
> >>>> +    } else if (bs->file) {
> >>>> +        bdrv_connect(bs->file, errp);
> >>>> +    } else {
> >>>> +        error_setg(errp, "this feature or command is not currently supported");
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +void bdrv_disconnect(BlockDriverState *bs)
> >>>> +{
> >>>> +    BlockDriver *drv = bs->drv;
> >>>> +
> >>>> +    if (drv && drv->bdrv_disconnect) {
> >>>> +        drv->bdrv_disconnect(bs);
> >>>> +    } else if (bs->file) {
> >>>> +        bdrv_disconnect(bs->file);
> >>>> +    }
> >>>> +}
> >>>
> >>> Please add doc comments describing the semantics of these commands.
> >>
> >> Where should it be documented? In the header file?
> > 
> > block.h doesn't document prototypes in the header file, please document
> > the function definition in block.c.  (QEMU is not consistent here, some
> > places do it the other way around.)
> > 
> >>> Why are these operations needed when there is already a bs->drv == NULL
> >>> case which means the BDS is not ready for read/write?
> >>>
> >>
> >> The purpos is that: don't connect to nbd server when opening a nbd client.
> >> connect/disconnect
> >> to nbd server when we need to do it.
> >>
> >> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
> >> connect/disconnect
> >> means that connect/disconnect to remote target(The target may be in another
> >> host).
> > 
> > Connect/disconnect puts something on the QEMU command-line that isn't
> > ready at startup time.
> > 
> > How about using monitor commands to add objects when needed instead?
> > 
> > That is cleaner because it doesn't introduce a new state (which is only
> > implemented for nbd).
> > 
> 
> The problem is that, nbd client is one child of quorum, and quorum must have more
> than one child. The nbd server is not ready until colo is running.

A monitor command to hot add/remove quorum children solves this problem
and could also be used in other scenarios (e.g. user decides to take a
quorum child offline).

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-19 10:49           ` Stefan Hajnoczi
@ 2015-06-20  3:31             ` Wen Congyang
  2015-06-22 12:39               ` Stefan Hajnoczi
  2015-06-24 14:07               ` Dr. David Alan Gilbert
  2015-06-24  1:11             ` Wen Congyang
  1 sibling, 2 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-20  3:31 UTC (permalink / raw)
  To: Stefan Hajnoczi, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>>>> +{
>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>> +
>>>>>> +    if (drv && drv->bdrv_connect) {
>>>>>> +        drv->bdrv_connect(bs, errp);
>>>>>> +    } else if (bs->file) {
>>>>>> +        bdrv_connect(bs->file, errp);
>>>>>> +    } else {
>>>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>>>> +{
>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>> +
>>>>>> +    if (drv && drv->bdrv_disconnect) {
>>>>>> +        drv->bdrv_disconnect(bs);
>>>>>> +    } else if (bs->file) {
>>>>>> +        bdrv_disconnect(bs->file);
>>>>>> +    }
>>>>>> +}
>>>>>
>>>>> Please add doc comments describing the semantics of these commands.
>>>>
>>>> Where should it be documented? In the header file?
>>>
>>> block.h doesn't document prototypes in the header file, please document
>>> the function definition in block.c.  (QEMU is not consistent here, some
>>> places do it the other way around.)
>>>
>>>>> Why are these operations needed when there is already a bs->drv == NULL
>>>>> case which means the BDS is not ready for read/write?
>>>>>
>>>>
>>>> The purpos is that: don't connect to nbd server when opening a nbd client.
>>>> connect/disconnect
>>>> to nbd server when we need to do it.
>>>>
>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>>>> connect/disconnect
>>>> means that connect/disconnect to remote target(The target may be in another
>>>> host).
>>>
>>> Connect/disconnect puts something on the QEMU command-line that isn't
>>> ready at startup time.
>>>
>>> How about using monitor commands to add objects when needed instead?
>>>
>>> That is cleaner because it doesn't introduce a new state (which is only
>>> implemented for nbd).
>>>
>>
>> The problem is that, nbd client is one child of quorum, and quorum must have more
>> than one child. The nbd server is not ready until colo is running.
>
> A monitor command to hot add/remove quorum children solves this problem
> and could also be used in other scenarios (e.g. user decides to take a
> quorum child offline).
>

For replication case, we always do checkpoint again and again after 
migration. If the
disk is not synced before migration, we will use disk mirgation or 
mirror job to sync
it. So we cannot start block replication when migration is running. We 
need that nbd
client is not ready when migration is running, and it is ready between 
migration ends
and checkpoint begins. Using a monotir command add the nbd client will 
cause larger
downtime. So if the nbd client has been added(only not connect to the 
nbd server),
we can connect to nbd server automatically.

Thanks
Wen Congyang

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-20  3:31             ` Wen Congyang
@ 2015-06-22 12:39               ` Stefan Hajnoczi
  2015-06-22 13:56                 ` Wen Congyang
  2015-06-24 14:07               ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-22 12:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Max Reitz, zhanghailiang, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	Lai Jiangshan

[-- Attachment #1: Type: text/plain, Size: 3525 bytes --]

On Sat, Jun 20, 2015 at 11:31:52AM +0800, Wen Congyang wrote:
> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
> >On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
> >>On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> >>>On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
> >>>>At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
> >>>>>On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
> >>>>>>+void bdrv_connect(BlockDriverState *bs, Error **errp)
> >>>>>>+{
> >>>>>>+    BlockDriver *drv = bs->drv;
> >>>>>>+
> >>>>>>+    if (drv && drv->bdrv_connect) {
> >>>>>>+        drv->bdrv_connect(bs, errp);
> >>>>>>+    } else if (bs->file) {
> >>>>>>+        bdrv_connect(bs->file, errp);
> >>>>>>+    } else {
> >>>>>>+        error_setg(errp, "this feature or command is not currently supported");
> >>>>>>+    }
> >>>>>>+}
> >>>>>>+
> >>>>>>+void bdrv_disconnect(BlockDriverState *bs)
> >>>>>>+{
> >>>>>>+    BlockDriver *drv = bs->drv;
> >>>>>>+
> >>>>>>+    if (drv && drv->bdrv_disconnect) {
> >>>>>>+        drv->bdrv_disconnect(bs);
> >>>>>>+    } else if (bs->file) {
> >>>>>>+        bdrv_disconnect(bs->file);
> >>>>>>+    }
> >>>>>>+}
> >>>>>
> >>>>>Please add doc comments describing the semantics of these commands.
> >>>>
> >>>>Where should it be documented? In the header file?
> >>>
> >>>block.h doesn't document prototypes in the header file, please document
> >>>the function definition in block.c.  (QEMU is not consistent here, some
> >>>places do it the other way around.)
> >>>
> >>>>>Why are these operations needed when there is already a bs->drv == NULL
> >>>>>case which means the BDS is not ready for read/write?
> >>>>>
> >>>>
> >>>>The purpos is that: don't connect to nbd server when opening a nbd client.
> >>>>connect/disconnect
> >>>>to nbd server when we need to do it.
> >>>>
> >>>>IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
> >>>>connect/disconnect
> >>>>means that connect/disconnect to remote target(The target may be in another
> >>>>host).
> >>>
> >>>Connect/disconnect puts something on the QEMU command-line that isn't
> >>>ready at startup time.
> >>>
> >>>How about using monitor commands to add objects when needed instead?
> >>>
> >>>That is cleaner because it doesn't introduce a new state (which is only
> >>>implemented for nbd).
> >>>
> >>
> >>The problem is that, nbd client is one child of quorum, and quorum must have more
> >>than one child. The nbd server is not ready until colo is running.
> >
> >A monitor command to hot add/remove quorum children solves this problem
> >and could also be used in other scenarios (e.g. user decides to take a
> >quorum child offline).
> >
> 
> For replication case, we always do checkpoint again and again after
> migration. If the
> disk is not synced before migration, we will use disk mirgation or mirror
> job to sync
> it. So we cannot start block replication when migration is running. We need
> that nbd
> client is not ready when migration is running, and it is ready between
> migration ends
> and checkpoint begins. Using a monotir command add the nbd client will cause
> larger
> downtime. So if the nbd client has been added(only not connect to the nbd
> server),
> we can connect to nbd server automatically.

I'm not sure I understand you.  Can you rephrase this?

The NBD connect should not be performed during downtime, regardless of
whether a monitor command is used or not.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-22 12:39               ` Stefan Hajnoczi
@ 2015-06-22 13:56                 ` Wen Congyang
  2015-06-23 13:42                   ` Stefan Hajnoczi
  0 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-22 13:56 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu block, Jiang Yunhong, Dong Eddie,
	qemu devel, Max Reitz, zhanghailiang, Gonglei, Stefan Hajnoczi,
	Paolo Bonzini, Yang Hongyang, Dr. David Alan Gilbert,
	Lai Jiangshan

At 2015/6/22 20:39, Stefan Hajnoczi Wrote:
> On Sat, Jun 20, 2015 at 11:31:52AM +0800, Wen Congyang wrote:
>> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
>>> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
>>>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
>>>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>>>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>>>>>> +{
>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>> +
>>>>>>>> +    if (drv && drv->bdrv_connect) {
>>>>>>>> +        drv->bdrv_connect(bs, errp);
>>>>>>>> +    } else if (bs->file) {
>>>>>>>> +        bdrv_connect(bs->file, errp);
>>>>>>>> +    } else {
>>>>>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>>>>>> +{
>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>> +
>>>>>>>> +    if (drv && drv->bdrv_disconnect) {
>>>>>>>> +        drv->bdrv_disconnect(bs);
>>>>>>>> +    } else if (bs->file) {
>>>>>>>> +        bdrv_disconnect(bs->file);
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>
>>>>>>> Please add doc comments describing the semantics of these commands.
>>>>>>
>>>>>> Where should it be documented? In the header file?
>>>>>
>>>>> block.h doesn't document prototypes in the header file, please document
>>>>> the function definition in block.c.  (QEMU is not consistent here, some
>>>>> places do it the other way around.)
>>>>>
>>>>>>> Why are these operations needed when there is already a bs->drv == NULL
>>>>>>> case which means the BDS is not ready for read/write?
>>>>>>>
>>>>>>
>>>>>> The purpos is that: don't connect to nbd server when opening a nbd client.
>>>>>> connect/disconnect
>>>>>> to nbd server when we need to do it.
>>>>>>
>>>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>>>>>> connect/disconnect
>>>>>> means that connect/disconnect to remote target(The target may be in another
>>>>>> host).
>>>>>
>>>>> Connect/disconnect puts something on the QEMU command-line that isn't
>>>>> ready at startup time.
>>>>>
>>>>> How about using monitor commands to add objects when needed instead?
>>>>>
>>>>> That is cleaner because it doesn't introduce a new state (which is only
>>>>> implemented for nbd).
>>>>>
>>>>
>>>> The problem is that, nbd client is one child of quorum, and quorum must have more
>>>> than one child. The nbd server is not ready until colo is running.
>>>
>>> A monitor command to hot add/remove quorum children solves this problem
>>> and could also be used in other scenarios (e.g. user decides to take a
>>> quorum child offline).
>>>
>>
>> For replication case, we always do checkpoint again and again after
>> migration. If the
>> disk is not synced before migration, we will use disk mirgation or mirror
>> job to sync
>> it. So we cannot start block replication when migration is running. We need
>> that nbd
>> client is not ready when migration is running, and it is ready between
>> migration ends
>> and checkpoint begins. Using a monotir command add the nbd client will cause
>> larger
>> downtime. So if the nbd client has been added(only not connect to the nbd
>> server),
>> we can connect to nbd server automatically.
>
> I'm not sure I understand you.  Can you rephrase this?

We only use block replication after migration because we assume that the 
disk has been synced
before block replication is started, so we only forward new write 
requests via nbd.

>
> The NBD connect should not be performed during downtime, regardless of
> whether a monitor command is used or not.

Why?

Thanks
Wen Congyang

>
> Stefan
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-18 16:06       ` Stefan Hajnoczi
  2015-06-19  0:54         ` Wen Congyang
@ 2015-06-23  6:44         ` Wen Congyang
  1 sibling, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-23  6:44 UTC (permalink / raw)
  To: Stefan Hajnoczi, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>> +{
>>>> +    BlockDriver *drv = bs->drv;
>>>> +
>>>> +    if (drv && drv->bdrv_connect) {
>>>> +        drv->bdrv_connect(bs, errp);
>>>> +    } else if (bs->file) {
>>>> +        bdrv_connect(bs->file, errp);
>>>> +    } else {
>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>> +    }
>>>> +}
>>>> +
>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>> +{
>>>> +    BlockDriver *drv = bs->drv;
>>>> +
>>>> +    if (drv && drv->bdrv_disconnect) {
>>>> +        drv->bdrv_disconnect(bs);
>>>> +    } else if (bs->file) {
>>>> +        bdrv_disconnect(bs->file);
>>>> +    }
>>>> +}
>>>
>>> Please add doc comments describing the semantics of these commands.
>>
>> Where should it be documented? In the header file?
> 
> block.h doesn't document prototypes in the header file, please document
> the function definition in block.c.  (QEMU is not consistent here, some
> places do it the other way around.)
> 
>>> Why are these operations needed when there is already a bs->drv == NULL
>>> case which means the BDS is not ready for read/write?
>>>
>>
>> The purpos is that: don't connect to nbd server when opening a nbd client.
>> connect/disconnect
>> to nbd server when we need to do it.
>>
>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>> connect/disconnect
>> means that connect/disconnect to remote target(The target may be in another
>> host).
> 
> Connect/disconnect puts something on the QEMU command-line that isn't
> ready at startup time.
> 
> How about using monitor commands to add objects when needed instead?

We have decieded use this way here:
http://lists.gnu.org/archive/html/qemu-devel/2015-04/msg02975.html

Thanks
Wen Congyang

> 
> That is cleaner because it doesn't introduce a new state (which is only
> implemented for nbd).
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-22 13:56                 ` Wen Congyang
@ 2015-06-23 13:42                   ` Stefan Hajnoczi
  2015-06-23 13:54                     ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Hajnoczi @ 2015-06-23 13:42 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On Mon, Jun 22, 2015 at 09:56:24PM +0800, Wen Congyang wrote:
> >The NBD connect should not be performed during downtime, regardless of
> >whether a monitor command is used or not.
> 
> Why?

NBD connection establishment takes unbounded time.  It can extend the
downtime period and cause users to notice that the guest is unavailable.

The point of COLO is fault tolerance and high availability.  In order to
achieve that, COLO must minimize downtime.  If an operation can be
performed outside the migration downtime phase, then it should be done
outside.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-23 13:42                   ` Stefan Hajnoczi
@ 2015-06-23 13:54                     ` Wen Congyang
  0 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-23 13:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, zhanghailiang

At 2015/6/23 21:42, Stefan Hajnoczi Wrote:
> On Mon, Jun 22, 2015 at 09:56:24PM +0800, Wen Congyang wrote:
>>> The NBD connect should not be performed during downtime, regardless of
>>> whether a monitor command is used or not.
>>
>> Why?
>
> NBD connection establishment takes unbounded time.  It can extend the
> downtime period and cause users to notice that the guest is unavailable.
>
> The point of COLO is fault tolerance and high availability.  In order to
> achieve that, COLO must minimize downtime.  If an operation can be
> performed outside the migration downtime phase, then it should be done
> outside.

Yes, you are right. I will think about it.

Thanks
Wen Congyang

>
> Stefan
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-19 10:49           ` Stefan Hajnoczi
  2015-06-20  3:31             ` Wen Congyang
@ 2015-06-24  1:11             ` Wen Congyang
  1 sibling, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-06-24  1:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Alberto Garcia,
	zhanghailiang, Gonglei, Paolo Bonzini, Yang Hongyang,
	Dr. David Alan Gilbert, Lai Jiangshan

On 06/19/2015 06:49 PM, Stefan Hajnoczi wrote:
> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>>>> +{
>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>> +
>>>>>> +    if (drv && drv->bdrv_connect) {
>>>>>> +        drv->bdrv_connect(bs, errp);
>>>>>> +    } else if (bs->file) {
>>>>>> +        bdrv_connect(bs->file, errp);
>>>>>> +    } else {
>>>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>>>> +{
>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>> +
>>>>>> +    if (drv && drv->bdrv_disconnect) {
>>>>>> +        drv->bdrv_disconnect(bs);
>>>>>> +    } else if (bs->file) {
>>>>>> +        bdrv_disconnect(bs->file);
>>>>>> +    }
>>>>>> +}
>>>>>
>>>>> Please add doc comments describing the semantics of these commands.
>>>>
>>>> Where should it be documented? In the header file?
>>>
>>> block.h doesn't document prototypes in the header file, please document
>>> the function definition in block.c.  (QEMU is not consistent here, some
>>> places do it the other way around.)
>>>
>>>>> Why are these operations needed when there is already a bs->drv == NULL
>>>>> case which means the BDS is not ready for read/write?
>>>>>
>>>>
>>>> The purpos is that: don't connect to nbd server when opening a nbd client.
>>>> connect/disconnect
>>>> to nbd server when we need to do it.
>>>>
>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>>>> connect/disconnect
>>>> means that connect/disconnect to remote target(The target may be in another
>>>> host).
>>>
>>> Connect/disconnect puts something on the QEMU command-line that isn't
>>> ready at startup time.
>>>
>>> How about using monitor commands to add objects when needed instead?
>>>
>>> That is cleaner because it doesn't introduce a new state (which is only
>>> implemented for nbd).
>>>
>>
>> The problem is that, nbd client is one child of quorum, and quorum must have more
>> than one child. The nbd server is not ready until colo is running.
> 
> A monitor command to hot add/remove quorum children solves this problem
> and could also be used in other scenarios (e.g. user decides to take a
> quorum child offline).
> 


CCing Alberto Garcia for the quorum block driver.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-20  3:31             ` Wen Congyang
  2015-06-22 12:39               ` Stefan Hajnoczi
@ 2015-06-24 14:07               ` Dr. David Alan Gilbert
  2015-06-25  1:01                 ` Wen Congyang
  1 sibling, 1 reply; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-24 14:07 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Fam Zheng, Lai Jiangshan,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini, qemu block,
	zhanghailiang

* Wen Congyang (ghostwcy@gmail.com) wrote:
> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
> >On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
> >>On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> >>>On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
> >>>>At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
> >>>>>On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
> >>>>>>+void bdrv_connect(BlockDriverState *bs, Error **errp)
> >>>>>>+{
> >>>>>>+    BlockDriver *drv = bs->drv;
> >>>>>>+
> >>>>>>+    if (drv && drv->bdrv_connect) {
> >>>>>>+        drv->bdrv_connect(bs, errp);
> >>>>>>+    } else if (bs->file) {
> >>>>>>+        bdrv_connect(bs->file, errp);
> >>>>>>+    } else {
> >>>>>>+        error_setg(errp, "this feature or command is not currently supported");
> >>>>>>+    }
> >>>>>>+}
> >>>>>>+
> >>>>>>+void bdrv_disconnect(BlockDriverState *bs)
> >>>>>>+{
> >>>>>>+    BlockDriver *drv = bs->drv;
> >>>>>>+
> >>>>>>+    if (drv && drv->bdrv_disconnect) {
> >>>>>>+        drv->bdrv_disconnect(bs);
> >>>>>>+    } else if (bs->file) {
> >>>>>>+        bdrv_disconnect(bs->file);
> >>>>>>+    }
> >>>>>>+}
> >>>>>
> >>>>>Please add doc comments describing the semantics of these commands.
> >>>>
> >>>>Where should it be documented? In the header file?
> >>>
> >>>block.h doesn't document prototypes in the header file, please document
> >>>the function definition in block.c.  (QEMU is not consistent here, some
> >>>places do it the other way around.)
> >>>
> >>>>>Why are these operations needed when there is already a bs->drv == NULL
> >>>>>case which means the BDS is not ready for read/write?
> >>>>>
> >>>>
> >>>>The purpos is that: don't connect to nbd server when opening a nbd client.
> >>>>connect/disconnect
> >>>>to nbd server when we need to do it.
> >>>>
> >>>>IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
> >>>>connect/disconnect
> >>>>means that connect/disconnect to remote target(The target may be in another
> >>>>host).
> >>>
> >>>Connect/disconnect puts something on the QEMU command-line that isn't
> >>>ready at startup time.
> >>>
> >>>How about using monitor commands to add objects when needed instead?
> >>>
> >>>That is cleaner because it doesn't introduce a new state (which is only
> >>>implemented for nbd).
> >>>
> >>
> >>The problem is that, nbd client is one child of quorum, and quorum must have more
> >>than one child. The nbd server is not ready until colo is running.
> >
> >A monitor command to hot add/remove quorum children solves this problem
> >and could also be used in other scenarios (e.g. user decides to take a
> >quorum child offline).
> >
> 
> For replication case, we always do checkpoint again and again after
> migration. If the disk is not synced before migration, we will use disk mirgation or mirror
> job to sync it.

Can you document the way that you use disk migration or mirror, together
with your COLO setup?   I think it would make it easier to understand this
restriction.
At the moment I don't understand how you can switch from doing a disk migration
into COLO mode - there seems to be a gap between the end of disk migration and the start
of COLO.

> So we cannot start block replication when migration is running. We need
> that nbd
> client is not ready when migration is running, and it is ready between
> migration ends
> and checkpoint begins. Using a monotir command add the nbd client will cause
> larger
> downtime. So if the nbd client has been added(only not connect to the nbd
> server),
> we can connect to nbd server automatically.

Without the disk migration or mirror, I can't see the need for the delayed bdrv_connect.
I can see that you want to do a disconnect at failover, however you need
to take care because if the network is broken at the point of failover
you need to make sure nothing blocks.

Dave

> 
> Thanks
> Wen Congyang
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-24 14:07               ` Dr. David Alan Gilbert
@ 2015-06-25  1:01                 ` Wen Congyang
  2015-06-26 19:03                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-25  1:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Lai Jiangshan, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie, qemu devel,
	Max Reitz, Gonglei, Stefan Hajnoczi, Paolo Bonzini,
	Yang Hongyang, zhanghailiang

On 06/24/2015 10:07 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (ghostwcy@gmail.com) wrote:
>> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
>>> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
>>>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
>>>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>>>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>>>>>> +{
>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>> +
>>>>>>>> +    if (drv && drv->bdrv_connect) {
>>>>>>>> +        drv->bdrv_connect(bs, errp);
>>>>>>>> +    } else if (bs->file) {
>>>>>>>> +        bdrv_connect(bs->file, errp);
>>>>>>>> +    } else {
>>>>>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>>>>>> +{
>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>> +
>>>>>>>> +    if (drv && drv->bdrv_disconnect) {
>>>>>>>> +        drv->bdrv_disconnect(bs);
>>>>>>>> +    } else if (bs->file) {
>>>>>>>> +        bdrv_disconnect(bs->file);
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>
>>>>>>> Please add doc comments describing the semantics of these commands.
>>>>>>
>>>>>> Where should it be documented? In the header file?
>>>>>
>>>>> block.h doesn't document prototypes in the header file, please document
>>>>> the function definition in block.c.  (QEMU is not consistent here, some
>>>>> places do it the other way around.)
>>>>>
>>>>>>> Why are these operations needed when there is already a bs->drv == NULL
>>>>>>> case which means the BDS is not ready for read/write?
>>>>>>>
>>>>>>
>>>>>> The purpos is that: don't connect to nbd server when opening a nbd client.
>>>>>> connect/disconnect
>>>>>> to nbd server when we need to do it.
>>>>>>
>>>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>>>>>> connect/disconnect
>>>>>> means that connect/disconnect to remote target(The target may be in another
>>>>>> host).
>>>>>
>>>>> Connect/disconnect puts something on the QEMU command-line that isn't
>>>>> ready at startup time.
>>>>>
>>>>> How about using monitor commands to add objects when needed instead?
>>>>>
>>>>> That is cleaner because it doesn't introduce a new state (which is only
>>>>> implemented for nbd).
>>>>>
>>>>
>>>> The problem is that, nbd client is one child of quorum, and quorum must have more
>>>> than one child. The nbd server is not ready until colo is running.
>>>
>>> A monitor command to hot add/remove quorum children solves this problem
>>> and could also be used in other scenarios (e.g. user decides to take a
>>> quorum child offline).
>>>
>>
>> For replication case, we always do checkpoint again and again after
>> migration. If the disk is not synced before migration, we will use disk mirgation or mirror
>> job to sync it.
> 
> Can you document the way that you use disk migration or mirror, together
> with your COLO setup?   I think it would make it easier to understand this
> restriction.
> At the moment I don't understand how you can switch from doing a disk migration
> into COLO mode - there seems to be a gap between the end of disk migration and the start
> of COLO.

In my test, I use disk migration.

After migration and before starting COLO, we will disable disk migration:
+    /* Disable block migration */
+    s->params.blk = 0;
+    s->params.shared = 0;
+    qemu_savevm_state_begin(trans, &s->params);

If the user uses mirror job, we don't cancel the mirror job now.

> 
>> So we cannot start block replication when migration is running. We need
>> that nbd
>> client is not ready when migration is running, and it is ready between
>> migration ends
>> and checkpoint begins. Using a monotir command add the nbd client will cause
>> larger
>> downtime. So if the nbd client has been added(only not connect to the nbd
>> server),
>> we can connect to nbd server automatically.
> 
> Without the disk migration or mirror, I can't see the need for the delayed bdrv_connect.
> I can see that you want to do a disconnect at failover, however you need
> to take care because if the network is broken at the point of failover
> you need to make sure nothing blocks.

disk migration is needed if the disk is not synced before migration.

Thanks
Wen Congyang

> 
> Dave
> 
>>
>> Thanks
>> Wen Congyang
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-25  1:01                 ` Wen Congyang
@ 2015-06-26 19:03                   ` Dr. David Alan Gilbert
  2015-06-29  1:05                     ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-26 19:03 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, zhanghailiang,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Lai Jiangshan

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 06/24/2015 10:07 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (ghostwcy@gmail.com) wrote:
> >> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
> >>> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
> >>>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
> >>>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
> >>>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
> >>>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
> >>>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
> >>>>>>>> +{
> >>>>>>>> +    BlockDriver *drv = bs->drv;
> >>>>>>>> +
> >>>>>>>> +    if (drv && drv->bdrv_connect) {
> >>>>>>>> +        drv->bdrv_connect(bs, errp);
> >>>>>>>> +    } else if (bs->file) {
> >>>>>>>> +        bdrv_connect(bs->file, errp);
> >>>>>>>> +    } else {
> >>>>>>>> +        error_setg(errp, "this feature or command is not currently supported");
> >>>>>>>> +    }
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
> >>>>>>>> +{
> >>>>>>>> +    BlockDriver *drv = bs->drv;
> >>>>>>>> +
> >>>>>>>> +    if (drv && drv->bdrv_disconnect) {
> >>>>>>>> +        drv->bdrv_disconnect(bs);
> >>>>>>>> +    } else if (bs->file) {
> >>>>>>>> +        bdrv_disconnect(bs->file);
> >>>>>>>> +    }
> >>>>>>>> +}
> >>>>>>>
> >>>>>>> Please add doc comments describing the semantics of these commands.
> >>>>>>
> >>>>>> Where should it be documented? In the header file?
> >>>>>
> >>>>> block.h doesn't document prototypes in the header file, please document
> >>>>> the function definition in block.c.  (QEMU is not consistent here, some
> >>>>> places do it the other way around.)
> >>>>>
> >>>>>>> Why are these operations needed when there is already a bs->drv == NULL
> >>>>>>> case which means the BDS is not ready for read/write?
> >>>>>>>
> >>>>>>
> >>>>>> The purpos is that: don't connect to nbd server when opening a nbd client.
> >>>>>> connect/disconnect
> >>>>>> to nbd server when we need to do it.
> >>>>>>
> >>>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
> >>>>>> connect/disconnect
> >>>>>> means that connect/disconnect to remote target(The target may be in another
> >>>>>> host).
> >>>>>
> >>>>> Connect/disconnect puts something on the QEMU command-line that isn't
> >>>>> ready at startup time.
> >>>>>
> >>>>> How about using monitor commands to add objects when needed instead?
> >>>>>
> >>>>> That is cleaner because it doesn't introduce a new state (which is only
> >>>>> implemented for nbd).
> >>>>>
> >>>>
> >>>> The problem is that, nbd client is one child of quorum, and quorum must have more
> >>>> than one child. The nbd server is not ready until colo is running.
> >>>
> >>> A monitor command to hot add/remove quorum children solves this problem
> >>> and could also be used in other scenarios (e.g. user decides to take a
> >>> quorum child offline).
> >>>
> >>
> >> For replication case, we always do checkpoint again and again after
> >> migration. If the disk is not synced before migration, we will use disk mirgation or mirror
> >> job to sync it.
> > 
> > Can you document the way that you use disk migration or mirror, together
> > with your COLO setup?   I think it would make it easier to understand this
> > restriction.
> > At the moment I don't understand how you can switch from doing a disk migration
> > into COLO mode - there seems to be a gap between the end of disk migration and the start
> > of COLO.
> 
> In my test, I use disk migration.
> 
> After migration and before starting COLO, we will disable disk migration:
> +    /* Disable block migration */
> +    s->params.blk = 0;
> +    s->params.shared = 0;
> +    qemu_savevm_state_begin(trans, &s->params);

Ah, I hadn't realised you could do that; so do you just do:

migrate_set_parameter colo on
migrate -d -b tcp:otherhhost:port

How does the secondary know to feed that data straight into the disk without
recording all the old data into the hidden-disk ?

> If the user uses mirror job, we don't cancel the mirror job now.

It would be good to get it to work with mirror, that seems preferred these
days to the old block migration.

With block migration or mirror, then that pretty much allows you to replace
a failed secondary doesn't it without restarting?  The only thing stopping
you is the NBD client needs to point to the address of the new secondary.
Stefan's suggestion of being able to modify the quorum on the fly would seem
to fix that problem.
(The other thing I can see is that the secondary wouldn't have the conntrack
data for open connections; but that's a separate problem).

A related topic was that a colleague was asking about starting the qemu
up and only then setting the NBD target (he's trying to wire it into OpenStack);
I suggested that instead of specifying the drive on the command line, it
might work to use a drive_add to add the whole quorum/drive stack before starting
the guest; however again something to modify the quorum would be easier.

Finally the last good reason I can see for being able to modify the quorum on the fly
is that you'll need it for continuous ft to be able to transmute a secondary into
a new primary.

Dave

> 
> > 
> >> So we cannot start block replication when migration is running. We need
> >> that nbd
> >> client is not ready when migration is running, and it is ready between
> >> migration ends
> >> and checkpoint begins. Using a monotir command add the nbd client will cause
> >> larger
> >> downtime. So if the nbd client has been added(only not connect to the nbd
> >> server),
> >> we can connect to nbd server automatically.
> > 
> > Without the disk migration or mirror, I can't see the need for the delayed bdrv_connect.
> > I can see that you want to do a disconnect at failover, however you need
> > to take care because if the network is broken at the point of failover
> > you need to make sure nothing blocks.
> 
> disk migration is needed if the disk is not synced before migration.
> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> >>
> >> Thanks
> >> Wen Congyang
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-26 19:03                   ` Dr. David Alan Gilbert
@ 2015-06-29  1:05                     ` Wen Congyang
  2015-06-30 19:01                       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-06-29  1:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, zhanghailiang,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	Lai Jiangshan

On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 06/24/2015 10:07 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (ghostwcy@gmail.com) wrote:
>>>> At 2015/6/19 18:49, Stefan Hajnoczi Wrote:
>>>>> On Fri, Jun 19, 2015 at 08:54:56AM +0800, Wen Congyang wrote:
>>>>>> On 06/19/2015 12:06 AM, Stefan Hajnoczi wrote:
>>>>>>> On Thu, Jun 18, 2015 at 10:36:39PM +0800, Wen Congyang wrote:
>>>>>>>> At 2015/6/18 20:55, Stefan Hajnoczi Wrote:
>>>>>>>>> On Thu, Jun 18, 2015 at 04:49:12PM +0800, Wen Congyang wrote:
>>>>>>>>>> +void bdrv_connect(BlockDriverState *bs, Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>>>> +
>>>>>>>>>> +    if (drv && drv->bdrv_connect) {
>>>>>>>>>> +        drv->bdrv_connect(bs, errp);
>>>>>>>>>> +    } else if (bs->file) {
>>>>>>>>>> +        bdrv_connect(bs->file, errp);
>>>>>>>>>> +    } else {
>>>>>>>>>> +        error_setg(errp, "this feature or command is not currently supported");
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +void bdrv_disconnect(BlockDriverState *bs)
>>>>>>>>>> +{
>>>>>>>>>> +    BlockDriver *drv = bs->drv;
>>>>>>>>>> +
>>>>>>>>>> +    if (drv && drv->bdrv_disconnect) {
>>>>>>>>>> +        drv->bdrv_disconnect(bs);
>>>>>>>>>> +    } else if (bs->file) {
>>>>>>>>>> +        bdrv_disconnect(bs->file);
>>>>>>>>>> +    }
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>> Please add doc comments describing the semantics of these commands.
>>>>>>>>
>>>>>>>> Where should it be documented? In the header file?
>>>>>>>
>>>>>>> block.h doesn't document prototypes in the header file, please document
>>>>>>> the function definition in block.c.  (QEMU is not consistent here, some
>>>>>>> places do it the other way around.)
>>>>>>>
>>>>>>>>> Why are these operations needed when there is already a bs->drv == NULL
>>>>>>>>> case which means the BDS is not ready for read/write?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The purpos is that: don't connect to nbd server when opening a nbd client.
>>>>>>>> connect/disconnect
>>>>>>>> to nbd server when we need to do it.
>>>>>>>>
>>>>>>>> IIUC, if bs->drv is NULL, it means that the driver is ejected? Here,
>>>>>>>> connect/disconnect
>>>>>>>> means that connect/disconnect to remote target(The target may be in another
>>>>>>>> host).
>>>>>>>
>>>>>>> Connect/disconnect puts something on the QEMU command-line that isn't
>>>>>>> ready at startup time.
>>>>>>>
>>>>>>> How about using monitor commands to add objects when needed instead?
>>>>>>>
>>>>>>> That is cleaner because it doesn't introduce a new state (which is only
>>>>>>> implemented for nbd).
>>>>>>>
>>>>>>
>>>>>> The problem is that, nbd client is one child of quorum, and quorum must have more
>>>>>> than one child. The nbd server is not ready until colo is running.
>>>>>
>>>>> A monitor command to hot add/remove quorum children solves this problem
>>>>> and could also be used in other scenarios (e.g. user decides to take a
>>>>> quorum child offline).
>>>>>
>>>>
>>>> For replication case, we always do checkpoint again and again after
>>>> migration. If the disk is not synced before migration, we will use disk mirgation or mirror
>>>> job to sync it.
>>>
>>> Can you document the way that you use disk migration or mirror, together
>>> with your COLO setup?   I think it would make it easier to understand this
>>> restriction.
>>> At the moment I don't understand how you can switch from doing a disk migration
>>> into COLO mode - there seems to be a gap between the end of disk migration and the start
>>> of COLO.
>>
>> In my test, I use disk migration.
>>
>> After migration and before starting COLO, we will disable disk migration:
>> +    /* Disable block migration */
>> +    s->params.blk = 0;
>> +    s->params.shared = 0;
>> +    qemu_savevm_state_begin(trans, &s->params);
> 
> Ah, I hadn't realised you could do that; so do you just do:
> 
> migrate_set_parameter colo on
> migrate -d -b tcp:otherhhost:port
> 
> How does the secondary know to feed that data straight into the disk without
> recording all the old data into the hidden-disk ?

hidden disk and active disk will be made empty when starting block replication.

> 
>> If the user uses mirror job, we don't cancel the mirror job now.
> 
> It would be good to get it to work with mirror, that seems preferred these
> days to the old block migration.

In normal migration, is mirror job created and cancelled by libvirt?

> 
> With block migration or mirror, then that pretty much allows you to replace
> a failed secondary doesn't it without restarting?  The only thing stopping
> you is the NBD client needs to point to the address of the new secondary.
> Stefan's suggestion of being able to modify the quorum on the fly would seem
> to fix that problem.

Yes, I am implementing adding/removing quorum child now.

> (The other thing I can see is that the secondary wouldn't have the conntrack
> data for open connections; but that's a separate problem).
> 
> A related topic was that a colleague was asking about starting the qemu
> up and only then setting the NBD target (he's trying to wire it into OpenStack);
> I suggested that instead of specifying the drive on the command line, it
> might work to use a drive_add to add the whole quorum/drive stack before starting
> the guest; however again something to modify the quorum would be easier.
> 
> Finally the last good reason I can see for being able to modify the quorum on the fly
> is that you'll need it for continuous ft to be able to transmute a secondary into
> a new primary.

Yes, I think adding a filter on the top is also needed.

Thanks
Wen Congyang

> 
> Dave
> 
>>
>>>
>>>> So we cannot start block replication when migration is running. We need
>>>> that nbd
>>>> client is not ready when migration is running, and it is ready between
>>>> migration ends
>>>> and checkpoint begins. Using a monotir command add the nbd client will cause
>>>> larger
>>>> downtime. So if the nbd client has been added(only not connect to the nbd
>>>> server),
>>>> we can connect to nbd server automatically.
>>>
>>> Without the disk migration or mirror, I can't see the need for the delayed bdrv_connect.
>>> I can see that you want to do a disconnect at failover, however you need
>>> to take care because if the network is broken at the point of failover
>>> you need to make sure nothing blocks.
>>
>> disk migration is needed if the disk is not synced before migration.
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Dave
>>>
>>>>
>>>> Thanks
>>>> Wen Congyang
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-29  1:05                     ` Wen Congyang
@ 2015-06-30 19:01                       ` Dr. David Alan Gilbert
  2015-07-01  1:01                         ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2015-06-30 19:01 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Lai Jiangshan,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	zhanghailiang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:

<snip>

> > Ah, I hadn't realised you could do that; so do you just do:
> > 
> > migrate_set_parameter colo on
> > migrate -d -b tcp:otherhhost:port
> > 
> > How does the secondary know to feed that data straight into the disk without
> > recording all the old data into the hidden-disk ?
> 
> hidden disk and active disk will be made empty when starting block replication.

Hmm, yes - I think I need to update to your current world; in the version
from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
if I try to use migrate -d -b 
(the bdrv_write fails)

> >> If the user uses mirror job, we don't cancel the mirror job now.
> > 
> > It would be good to get it to work with mirror, that seems preferred these
> > days to the old block migration.
> 
> In normal migration, is mirror job created and cancelled by libvirt?

Yes, I think so; you should be able to turn on full logging on libvirt and
watch the qmp commands it sends.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-06-30 19:01                       ` Dr. David Alan Gilbert
@ 2015-07-01  1:01                         ` Wen Congyang
  2015-07-01  8:11                           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-07-01  1:01 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Lai Jiangshan,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	zhanghailiang

On 07/01/2015 03:01 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
> 
> <snip>
> 
>>> Ah, I hadn't realised you could do that; so do you just do:
>>>
>>> migrate_set_parameter colo on
>>> migrate -d -b tcp:otherhhost:port
>>>
>>> How does the secondary know to feed that data straight into the disk without
>>> recording all the old data into the hidden-disk ?
>>
>> hidden disk and active disk will be made empty when starting block replication.
> 
> Hmm, yes - I think I need to update to your current world; in the version
> from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
> if I try to use migrate -d -b 
> (the bdrv_write fails)

Can you give me both primary and secondary qemu's command? I think the command line is wrong,
and disk migration fails.

> 
>>>> If the user uses mirror job, we don't cancel the mirror job now.
>>>
>>> It would be good to get it to work with mirror, that seems preferred these
>>> days to the old block migration.
>>
>> In normal migration, is mirror job created and cancelled by libvirt?
> 
> Yes, I think so; you should be able to turn on full logging on libvirt and
> watch the qmp commands it sends.

Supporting mirror job in my TODO list now. But I think we should focus the basci function now.

Thanks
Wen Congyang

> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-07-01  1:01                         ` Wen Congyang
@ 2015-07-01  8:11                           ` Dr. David Alan Gilbert
  2015-07-01  8:18                             ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-01  8:11 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Lai Jiangshan,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	zhanghailiang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 07/01/2015 03:01 AM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
> > 
> > <snip>
> > 
> >>> Ah, I hadn't realised you could do that; so do you just do:
> >>>
> >>> migrate_set_parameter colo on
> >>> migrate -d -b tcp:otherhhost:port
> >>>
> >>> How does the secondary know to feed that data straight into the disk without
> >>> recording all the old data into the hidden-disk ?
> >>
> >> hidden disk and active disk will be made empty when starting block replication.
> > 
> > Hmm, yes - I think I need to update to your current world; in the version
> > from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
> > if I try to use migrate -d -b 
> > (the bdrv_write fails)
> 
> Can you give me both primary and secondary qemu's command? I think the command line is wrong,
> and disk migration fails.
> 

Primary:

./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
     -boot c -m 4096 -smp 4 -S \
     -name debug-threads=on -trace events=trace-file \
     -netdev tap,id=hn0,script=$PWD/ifup-prim,\
downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
     -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
     -drive if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,\
cache=none,aio=native,\
children.0.file.filename=./bugzilla.raw,\
children.0.driver=raw,\
children.1.file.driver=nbd,\
children.1.file.host=ibpair,\
children.1.file.port=8889,\
children.1.file.export=colo1,\
children.1.driver=replication,\
children.1.mode=primary,\
children.1.ignore-errors=on


Secondary:

./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
     -boot c -m 4096 -smp 4 -S \
     -name debug-threads=on -trace events=trace-file \
     -netdev tap,id=hn0,script=$PWD/ifup-slave,\
downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
     -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
     -drive if=none,driver=raw,file=bugzilla.raw,id=nbd_target1,cache=none,aio=native \
     -drive if=virtio,driver=replication,mode=secondary,export=colo1,throttling.bps-total-max=70000000,\
file.file.filename=/run/colo-active-disk.qcow2,\
file.driver=qcow2,\
file.backing_reference.drive_id=nbd_target1,\
file.backing_reference.hidden-disk.file.filename=/run/colo-hidden-disk.qcow2,\
file.backing_reference.hidden-disk.driver=qcow2,\
file.backing_reference.hidden-disk.allow-write-backing-file=on \
     -incoming tcp:0:8888


Thanks,

Dave

> >>>> If the user uses mirror job, we don't cancel the mirror job now.
> >>>
> >>> It would be good to get it to work with mirror, that seems preferred these
> >>> days to the old block migration.
> >>
> >> In normal migration, is mirror job created and cancelled by libvirt?
> > 
> > Yes, I think so; you should be able to turn on full logging on libvirt and
> > watch the qmp commands it sends.
> 
> Supporting mirror job in my TODO list now. But I think we should focus the basci function now.
> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-07-01  8:11                           ` Dr. David Alan Gilbert
@ 2015-07-01  8:18                             ` Wen Congyang
  2015-07-01 18:42                               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-07-01  8:18 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Lai Jiangshan,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	zhanghailiang

On 07/01/2015 04:11 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 07/01/2015 03:01 AM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
>>>
>>> <snip>
>>>
>>>>> Ah, I hadn't realised you could do that; so do you just do:
>>>>>
>>>>> migrate_set_parameter colo on
>>>>> migrate -d -b tcp:otherhhost:port
>>>>>
>>>>> How does the secondary know to feed that data straight into the disk without
>>>>> recording all the old data into the hidden-disk ?
>>>>
>>>> hidden disk and active disk will be made empty when starting block replication.
>>>
>>> Hmm, yes - I think I need to update to your current world; in the version
>>> from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
>>> if I try to use migrate -d -b 
>>> (the bdrv_write fails)
>>
>> Can you give me both primary and secondary qemu's command? I think the command line is wrong,
>> and disk migration fails.
>>
> 
> Primary:
> 
> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
>      -boot c -m 4096 -smp 4 -S \
>      -name debug-threads=on -trace events=trace-file \
>      -netdev tap,id=hn0,script=$PWD/ifup-prim,\
> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
>      -drive if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,\
> cache=none,aio=native,\
> children.0.file.filename=./bugzilla.raw,\
> children.0.driver=raw,\
> children.1.file.driver=nbd,\
> children.1.file.host=ibpair,\
> children.1.file.port=8889,\
> children.1.file.export=colo1,\
> children.1.driver=replication,\
> children.1.mode=primary,\
> children.1.ignore-errors=on

Add id=nbd_target1 to primary disk option, and try it. Disk migration needs the same id to sync
the disk.

Thanks
Wen Congyang

> 
> 
> Secondary:
> 
> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
>      -boot c -m 4096 -smp 4 -S \
>      -name debug-threads=on -trace events=trace-file \
>      -netdev tap,id=hn0,script=$PWD/ifup-slave,\
> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
>      -drive if=none,driver=raw,file=bugzilla.raw,id=nbd_target1,cache=none,aio=native \
>      -drive if=virtio,driver=replication,mode=secondary,export=colo1,throttling.bps-total-max=70000000,\
> file.file.filename=/run/colo-active-disk.qcow2,\
> file.driver=qcow2,\
> file.backing_reference.drive_id=nbd_target1,\
> file.backing_reference.hidden-disk.file.filename=/run/colo-hidden-disk.qcow2,\
> file.backing_reference.hidden-disk.driver=qcow2,\
> file.backing_reference.hidden-disk.allow-write-backing-file=on \
>      -incoming tcp:0:8888
> 
> 
> Thanks,
> 
> Dave
> 
>>>>>> If the user uses mirror job, we don't cancel the mirror job now.
>>>>>
>>>>> It would be good to get it to work with mirror, that seems preferred these
>>>>> days to the old block migration.
>>>>
>>>> In normal migration, is mirror job created and cancelled by libvirt?
>>>
>>> Yes, I think so; you should be able to turn on full logging on libvirt and
>>> watch the qmp commands it sends.
>>
>> Supporting mirror job in my TODO list now. But I think we should focus the basci function now.
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Dave
>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-07-01  8:18                             ` Wen Congyang
@ 2015-07-01 18:42                               ` Dr. David Alan Gilbert
  2015-07-02  0:55                                 ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-01 18:42 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Lai Jiangshan,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	zhanghailiang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 07/01/2015 04:11 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> On 07/01/2015 03:01 AM, Dr. David Alan Gilbert wrote:
> >>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >>>> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
> >>>
> >>> <snip>
> >>>
> >>>>> Ah, I hadn't realised you could do that; so do you just do:
> >>>>>
> >>>>> migrate_set_parameter colo on
> >>>>> migrate -d -b tcp:otherhhost:port
> >>>>>
> >>>>> How does the secondary know to feed that data straight into the disk without
> >>>>> recording all the old data into the hidden-disk ?
> >>>>
> >>>> hidden disk and active disk will be made empty when starting block replication.
> >>>
> >>> Hmm, yes - I think I need to update to your current world; in the version
> >>> from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
> >>> if I try to use migrate -d -b 
> >>> (the bdrv_write fails)
> >>
> >> Can you give me both primary and secondary qemu's command? I think the command line is wrong,
> >> and disk migration fails.
> >>
> > 
> > Primary:
> > 
> > ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
> >      -boot c -m 4096 -smp 4 -S \
> >      -name debug-threads=on -trace events=trace-file \
> >      -netdev tap,id=hn0,script=$PWD/ifup-prim,\
> > downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
> >      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
> >      -drive if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,\
> > cache=none,aio=native,\
> > children.0.file.filename=./bugzilla.raw,\
> > children.0.driver=raw,\
> > children.1.file.driver=nbd,\
> > children.1.file.host=ibpair,\
> > children.1.file.port=8889,\
> > children.1.file.export=colo1,\
> > children.1.driver=replication,\
> > children.1.mode=primary,\
> > children.1.ignore-errors=on
> 
> Add id=nbd_target1 to primary disk option, and try it. Disk migration needs the same id to sync
> the disk.

Thank you! That worked nicely.
The only odd thing was that the hidden disk on the secondary went upto ~2GB in size during
the disk copy; (This is still on the version you posted at the end of may).
I don't really understand why it was 2GB - the disk was 40GB, and qemu-img tells me that
2.6GB of it were used.   Still, it would be good to avoid the overhead of going through
the hidden disk on the secondary for the initial replication.

Dave

> Thanks
> Wen Congyang
> 
> > 
> > 
> > Secondary:
> > 
> > ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
> >      -boot c -m 4096 -smp 4 -S \
> >      -name debug-threads=on -trace events=trace-file \
> >      -netdev tap,id=hn0,script=$PWD/ifup-slave,\
> > downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
> >      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
> >      -drive if=none,driver=raw,file=bugzilla.raw,id=nbd_target1,cache=none,aio=native \
> >      -drive if=virtio,driver=replication,mode=secondary,export=colo1,throttling.bps-total-max=70000000,\
> > file.file.filename=/run/colo-active-disk.qcow2,\
> > file.driver=qcow2,\
> > file.backing_reference.drive_id=nbd_target1,\
> > file.backing_reference.hidden-disk.file.filename=/run/colo-hidden-disk.qcow2,\
> > file.backing_reference.hidden-disk.driver=qcow2,\
> > file.backing_reference.hidden-disk.allow-write-backing-file=on \
> >      -incoming tcp:0:8888
> > 
> > 
> > Thanks,
> > 
> > Dave
> > 
> >>>>>> If the user uses mirror job, we don't cancel the mirror job now.
> >>>>>
> >>>>> It would be good to get it to work with mirror, that seems preferred these
> >>>>> days to the old block migration.
> >>>>
> >>>> In normal migration, is mirror job created and cancelled by libvirt?
> >>>
> >>> Yes, I think so; you should be able to turn on full logging on libvirt and
> >>> watch the qmp commands it sends.
> >>
> >> Supporting mirror job in my TODO list now. But I think we should focus the basci function now.
> >>
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>> Dave
> >>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>> .
> >>>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-07-01 18:42                               ` Dr. David Alan Gilbert
@ 2015-07-02  0:55                                 ` Wen Congyang
  2015-07-02  7:54                                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 50+ messages in thread
From: Wen Congyang @ 2015-07-02  0:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Lai Jiangshan,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	zhanghailiang

On 07/02/2015 02:42 AM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 07/01/2015 04:11 PM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>> On 07/01/2015 03:01 AM, Dr. David Alan Gilbert wrote:
>>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>>> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
>>>>>
>>>>> <snip>
>>>>>
>>>>>>> Ah, I hadn't realised you could do that; so do you just do:
>>>>>>>
>>>>>>> migrate_set_parameter colo on
>>>>>>> migrate -d -b tcp:otherhhost:port
>>>>>>>
>>>>>>> How does the secondary know to feed that data straight into the disk without
>>>>>>> recording all the old data into the hidden-disk ?
>>>>>>
>>>>>> hidden disk and active disk will be made empty when starting block replication.
>>>>>
>>>>> Hmm, yes - I think I need to update to your current world; in the version
>>>>> from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
>>>>> if I try to use migrate -d -b 
>>>>> (the bdrv_write fails)
>>>>
>>>> Can you give me both primary and secondary qemu's command? I think the command line is wrong,
>>>> and disk migration fails.
>>>>
>>>
>>> Primary:
>>>
>>> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
>>>      -boot c -m 4096 -smp 4 -S \
>>>      -name debug-threads=on -trace events=trace-file \
>>>      -netdev tap,id=hn0,script=$PWD/ifup-prim,\
>>> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
>>>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
>>>      -drive if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,\
>>> cache=none,aio=native,\
>>> children.0.file.filename=./bugzilla.raw,\
>>> children.0.driver=raw,\
>>> children.1.file.driver=nbd,\
>>> children.1.file.host=ibpair,\
>>> children.1.file.port=8889,\
>>> children.1.file.export=colo1,\
>>> children.1.driver=replication,\
>>> children.1.mode=primary,\
>>> children.1.ignore-errors=on
>>
>> Add id=nbd_target1 to primary disk option, and try it. Disk migration needs the same id to sync
>> the disk.
> 
> Thank you! That worked nicely.
> The only odd thing was that the hidden disk on the secondary went upto ~2GB in size during
> the disk copy; (This is still on the version you posted at the end of may).
> I don't really understand why it was 2GB - the disk was 40GB, and qemu-img tells me that
> 2.6GB of it were used.   Still, it would be good to avoid the overhead of going through
> the hidden disk on the secondary for the initial replication.

Yes, I have fixed it in v7: starting backup job later.

Thanks
Wen Congyang

> 
> Dave
> 
>> Thanks
>> Wen Congyang
>>
>>>
>>>
>>> Secondary:
>>>
>>> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
>>>      -boot c -m 4096 -smp 4 -S \
>>>      -name debug-threads=on -trace events=trace-file \
>>>      -netdev tap,id=hn0,script=$PWD/ifup-slave,\
>>> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
>>>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
>>>      -drive if=none,driver=raw,file=bugzilla.raw,id=nbd_target1,cache=none,aio=native \
>>>      -drive if=virtio,driver=replication,mode=secondary,export=colo1,throttling.bps-total-max=70000000,\
>>> file.file.filename=/run/colo-active-disk.qcow2,\
>>> file.driver=qcow2,\
>>> file.backing_reference.drive_id=nbd_target1,\
>>> file.backing_reference.hidden-disk.file.filename=/run/colo-hidden-disk.qcow2,\
>>> file.backing_reference.hidden-disk.driver=qcow2,\
>>> file.backing_reference.hidden-disk.allow-write-backing-file=on \
>>>      -incoming tcp:0:8888
>>>
>>>
>>> Thanks,
>>>
>>> Dave
>>>
>>>>>>>> If the user uses mirror job, we don't cancel the mirror job now.
>>>>>>>
>>>>>>> It would be good to get it to work with mirror, that seems preferred these
>>>>>>> days to the old block migration.
>>>>>>
>>>>>> In normal migration, is mirror job created and cancelled by libvirt?
>>>>>
>>>>> Yes, I think so; you should be able to turn on full logging on libvirt and
>>>>> watch the qmp commands it sends.
>>>>
>>>> Supporting mirror job in my TODO list now. But I think we should focus the basci function now.
>>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>> Dave
>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>> .
>>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-07-02  0:55                                 ` Wen Congyang
@ 2015-07-02  7:54                                   ` Dr. David Alan Gilbert
  2015-07-02  8:05                                     ` Wen Congyang
  0 siblings, 1 reply; 50+ messages in thread
From: Dr. David Alan Gilbert @ 2015-07-02  7:54 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Kevin Wolf, Yang Hongyang, Fam Zheng, Wen Congyang, qemu block,
	Stefan Hajnoczi, Jiang Yunhong, Dong Eddie,
	Dr. David Alan Gilbert, qemu devel, Lai Jiangshan, Gonglei,
	Stefan Hajnoczi, Paolo Bonzini, Max Reitz, zhanghailiang

* Wen Congyang (wency@cn.fujitsu.com) wrote:
> On 07/02/2015 02:42 AM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >> On 07/01/2015 04:11 PM, Dr. David Alan Gilbert wrote:
> >>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >>>> On 07/01/2015 03:01 AM, Dr. David Alan Gilbert wrote:
> >>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
> >>>>>> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
> >>>>>
> >>>>> <snip>
> >>>>>
> >>>>>>> Ah, I hadn't realised you could do that; so do you just do:
> >>>>>>>
> >>>>>>> migrate_set_parameter colo on
> >>>>>>> migrate -d -b tcp:otherhhost:port
> >>>>>>>
> >>>>>>> How does the secondary know to feed that data straight into the disk without
> >>>>>>> recording all the old data into the hidden-disk ?
> >>>>>>
> >>>>>> hidden disk and active disk will be made empty when starting block replication.
> >>>>>
> >>>>> Hmm, yes - I think I need to update to your current world; in the version
> >>>>> from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
> >>>>> if I try to use migrate -d -b 
> >>>>> (the bdrv_write fails)
> >>>>
> >>>> Can you give me both primary and secondary qemu's command? I think the command line is wrong,
> >>>> and disk migration fails.
> >>>>
> >>>
> >>> Primary:
> >>>
> >>> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
> >>>      -boot c -m 4096 -smp 4 -S \
> >>>      -name debug-threads=on -trace events=trace-file \
> >>>      -netdev tap,id=hn0,script=$PWD/ifup-prim,\
> >>> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
> >>>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
> >>>      -drive if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,\
> >>> cache=none,aio=native,\
> >>> children.0.file.filename=./bugzilla.raw,\
> >>> children.0.driver=raw,\
> >>> children.1.file.driver=nbd,\
> >>> children.1.file.host=ibpair,\
> >>> children.1.file.port=8889,\
> >>> children.1.file.export=colo1,\
> >>> children.1.driver=replication,\
> >>> children.1.mode=primary,\
> >>> children.1.ignore-errors=on
> >>
> >> Add id=nbd_target1 to primary disk option, and try it. Disk migration needs the same id to sync
> >> the disk.
> > 
> > Thank you! That worked nicely.
> > The only odd thing was that the hidden disk on the secondary went upto ~2GB in size during
> > the disk copy; (This is still on the version you posted at the end of may).
> > I don't really understand why it was 2GB - the disk was 40GB, and qemu-img tells me that
> > 2.6GB of it were used.   Still, it would be good to avoid the overhead of going through
> > the hidden disk on the secondary for the initial replication.
> 
> Yes, I have fixed it in v7: starting backup job later.

Thanks, I'll try it.

Dave

> 
> Thanks
> Wen Congyang
> 
> > 
> > Dave
> > 
> >> Thanks
> >> Wen Congyang
> >>
> >>>
> >>>
> >>> Secondary:
> >>>
> >>> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
> >>>      -boot c -m 4096 -smp 4 -S \
> >>>      -name debug-threads=on -trace events=trace-file \
> >>>      -netdev tap,id=hn0,script=$PWD/ifup-slave,\
> >>> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
> >>>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
> >>>      -drive if=none,driver=raw,file=bugzilla.raw,id=nbd_target1,cache=none,aio=native \
> >>>      -drive if=virtio,driver=replication,mode=secondary,export=colo1,throttling.bps-total-max=70000000,\
> >>> file.file.filename=/run/colo-active-disk.qcow2,\
> >>> file.driver=qcow2,\
> >>> file.backing_reference.drive_id=nbd_target1,\
> >>> file.backing_reference.hidden-disk.file.filename=/run/colo-hidden-disk.qcow2,\
> >>> file.backing_reference.hidden-disk.driver=qcow2,\
> >>> file.backing_reference.hidden-disk.allow-write-backing-file=on \
> >>>      -incoming tcp:0:8888
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Dave
> >>>
> >>>>>>>> If the user uses mirror job, we don't cancel the mirror job now.
> >>>>>>>
> >>>>>>> It would be good to get it to work with mirror, that seems preferred these
> >>>>>>> days to the old block migration.
> >>>>>>
> >>>>>> In normal migration, is mirror job created and cancelled by libvirt?
> >>>>>
> >>>>> Yes, I think so; you should be able to turn on full logging on libvirt and
> >>>>> watch the qmp commands it sends.
> >>>>
> >>>> Supporting mirror job in my TODO list now. But I think we should focus the basci function now.
> >>>>
> >>>> Thanks
> >>>> Wen Congyang
> >>>>
> >>>>>
> >>>>> Dave
> >>>>>
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>>> .
> >>>>>
> >>>>
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>> .
> >>>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target
  2015-07-02  7:54                                   ` Dr. David Alan Gilbert
@ 2015-07-02  8:05                                     ` Wen Congyang
  0 siblings, 0 replies; 50+ messages in thread
From: Wen Congyang @ 2015-07-02  8:05 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, Wen Congyang, qemu block, Stefan Hajnoczi,
	Jiang Yunhong, Dong Eddie, qemu devel, Max Reitz, Lai Jiangshan,
	Gonglei, Stefan Hajnoczi, Paolo Bonzini, Yang Hongyang,
	zhanghailiang

On 07/02/2015 03:54 PM, Dr. David Alan Gilbert wrote:
> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>> On 07/02/2015 02:42 AM, Dr. David Alan Gilbert wrote:
>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>> On 07/01/2015 04:11 PM, Dr. David Alan Gilbert wrote:
>>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>>> On 07/01/2015 03:01 AM, Dr. David Alan Gilbert wrote:
>>>>>>> * Wen Congyang (wency@cn.fujitsu.com) wrote:
>>>>>>>> On 06/27/2015 03:03 AM, Dr. David Alan Gilbert wrote:
>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>>> Ah, I hadn't realised you could do that; so do you just do:
>>>>>>>>>
>>>>>>>>> migrate_set_parameter colo on
>>>>>>>>> migrate -d -b tcp:otherhhost:port
>>>>>>>>>
>>>>>>>>> How does the secondary know to feed that data straight into the disk without
>>>>>>>>> recording all the old data into the hidden-disk ?
>>>>>>>>
>>>>>>>> hidden disk and active disk will be made empty when starting block replication.
>>>>>>>
>>>>>>> Hmm, yes - I think I need to update to your current world; in the version
>>>>>>> from the end of May, I get a 'error while loading state for instance 0x0 of device 'block''
>>>>>>> if I try to use migrate -d -b 
>>>>>>> (the bdrv_write fails)
>>>>>>
>>>>>> Can you give me both primary and secondary qemu's command? I think the command line is wrong,
>>>>>> and disk migration fails.
>>>>>>
>>>>>
>>>>> Primary:
>>>>>
>>>>> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
>>>>>      -boot c -m 4096 -smp 4 -S \
>>>>>      -name debug-threads=on -trace events=trace-file \
>>>>>      -netdev tap,id=hn0,script=$PWD/ifup-prim,\
>>>>> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
>>>>>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
>>>>>      -drive if=virtio,driver=quorum,read-pattern=fifo,no-connect=on,\
>>>>> cache=none,aio=native,\
>>>>> children.0.file.filename=./bugzilla.raw,\
>>>>> children.0.driver=raw,\
>>>>> children.1.file.driver=nbd,\
>>>>> children.1.file.host=ibpair,\
>>>>> children.1.file.port=8889,\
>>>>> children.1.file.export=colo1,\
>>>>> children.1.driver=replication,\
>>>>> children.1.mode=primary,\
>>>>> children.1.ignore-errors=on
>>>>
>>>> Add id=nbd_target1 to primary disk option, and try it. Disk migration needs the same id to sync
>>>> the disk.
>>>
>>> Thank you! That worked nicely.
>>> The only odd thing was that the hidden disk on the secondary went upto ~2GB in size during
>>> the disk copy; (This is still on the version you posted at the end of may).
>>> I don't really understand why it was 2GB - the disk was 40GB, and qemu-img tells me that
>>> 2.6GB of it were used.   Still, it would be good to avoid the overhead of going through
>>> the hidden disk on the secondary for the initial replication.
>>
>> Yes, I have fixed it in v7: starting backup job later.
> 
> Thanks, I'll try it.

Note: the usage is changed, and you can get the usage from the wiki:
http://wiki.qemu.org/Features/COLO

Thanks
Wen Congyang

> 
> Dave
> 
>>
>> Thanks
>> Wen Congyang
>>
>>>
>>> Dave
>>>
>>>> Thanks
>>>> Wen Congyang
>>>>
>>>>>
>>>>>
>>>>> Secondary:
>>>>>
>>>>> ./try/bin/qemu-system-x86_64 -enable-kvm -nographic \
>>>>>      -boot c -m 4096 -smp 4 -S \
>>>>>      -name debug-threads=on -trace events=trace-file \
>>>>>      -netdev tap,id=hn0,script=$PWD/ifup-slave,\
>>>>> downscript=no,colo_script=$PWD/qemu/scripts/colo-proxy-script.sh,colo_nicname=em4 \
>>>>>      -device e1000,mac=9c:da:4d:1c:b5:89,id=net-pci0,netdev=hn0 \
>>>>>      -drive if=none,driver=raw,file=bugzilla.raw,id=nbd_target1,cache=none,aio=native \
>>>>>      -drive if=virtio,driver=replication,mode=secondary,export=colo1,throttling.bps-total-max=70000000,\
>>>>> file.file.filename=/run/colo-active-disk.qcow2,\
>>>>> file.driver=qcow2,\
>>>>> file.backing_reference.drive_id=nbd_target1,\
>>>>> file.backing_reference.hidden-disk.file.filename=/run/colo-hidden-disk.qcow2,\
>>>>> file.backing_reference.hidden-disk.driver=qcow2,\
>>>>> file.backing_reference.hidden-disk.allow-write-backing-file=on \
>>>>>      -incoming tcp:0:8888
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Dave
>>>>>
>>>>>>>>>> If the user uses mirror job, we don't cancel the mirror job now.
>>>>>>>>>
>>>>>>>>> It would be good to get it to work with mirror, that seems preferred these
>>>>>>>>> days to the old block migration.
>>>>>>>>
>>>>>>>> In normal migration, is mirror job created and cancelled by libvirt?
>>>>>>>
>>>>>>> Yes, I think so; you should be able to turn on full logging on libvirt and
>>>>>>> watch the qmp commands it sends.
>>>>>>
>>>>>> Supporting mirror job in my TODO list now. But I think we should focus the basci function now.
>>>>>>
>>>>>> Thanks
>>>>>> Wen Congyang
>>>>>>
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>> --
>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>>> .
>>>>>>>
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>> .
>>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2015-07-02  8:02 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18  8:49 [Qemu-devel] [PATCH COLO-Block v6 00/16] Block replication for continuous checkpoints Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 01/16] docs: block replication's description Wen Congyang
2015-06-18 10:34   ` Stefan Hajnoczi
2015-06-18 10:51     ` Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 02/16] allow writing to the backing file Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 03/16] Allow creating backup jobs when opening BDS Wen Congyang
2015-06-18 10:40   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 04/16] block: Parse "backing_reference" option to reference existing BDS Wen Congyang
2015-06-18 10:50   ` Stefan Hajnoczi
2015-06-18 11:40     ` Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 05/16] Backup: clear all bitmap when doing block checkpoint Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 06/16] Don't allow a disk use backing reference target Wen Congyang
2015-06-18 12:47   ` Stefan Hajnoczi
2015-06-18 15:17     ` Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 07/16] Add new block driver interface to connect/disconnect the remote target Wen Congyang
2015-06-18 12:55   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-06-18 14:36     ` Wen Congyang
2015-06-18 16:06       ` Stefan Hajnoczi
2015-06-19  0:54         ` Wen Congyang
2015-06-19 10:49           ` Stefan Hajnoczi
2015-06-20  3:31             ` Wen Congyang
2015-06-22 12:39               ` Stefan Hajnoczi
2015-06-22 13:56                 ` Wen Congyang
2015-06-23 13:42                   ` Stefan Hajnoczi
2015-06-23 13:54                     ` Wen Congyang
2015-06-24 14:07               ` Dr. David Alan Gilbert
2015-06-25  1:01                 ` Wen Congyang
2015-06-26 19:03                   ` Dr. David Alan Gilbert
2015-06-29  1:05                     ` Wen Congyang
2015-06-30 19:01                       ` Dr. David Alan Gilbert
2015-07-01  1:01                         ` Wen Congyang
2015-07-01  8:11                           ` Dr. David Alan Gilbert
2015-07-01  8:18                             ` Wen Congyang
2015-07-01 18:42                               ` Dr. David Alan Gilbert
2015-07-02  0:55                                 ` Wen Congyang
2015-07-02  7:54                                   ` Dr. David Alan Gilbert
2015-07-02  8:05                                     ` Wen Congyang
2015-06-24  1:11             ` Wen Congyang
2015-06-23  6:44         ` Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 08/16] NBD client: implement block driver interfaces to connect/disconnect NBD server Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 09/16] Introduce a new -drive option to control whether to connect to remote target Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 10/16] NBD client: connect to nbd server later Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 11/16] Add new block driver interfaces to control block replication Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 12/16] skip nbd_target when starting " Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 13/16] quorum: implement block driver interfaces for " Wen Congyang
2015-06-18 13:06   ` Stefan Hajnoczi
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 14/16] introduce a new API qemu_opts_absorb_qdict_by_index() Wen Congyang
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 15/16] quorum: allow ignoring child errors Wen Congyang
2015-06-18 13:06   ` Stefan Hajnoczi
2015-06-18  8:49 ` [Qemu-devel] [PATCH COLO-Block v6 16/16] Implement new driver for block replication Wen Congyang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.